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This book presents an introductory account of stochastic processes, estima- 
tion theory, and image enhancement. It is primarily intended for first -year 
graduate students and practicing engineers and scientists whose work requires 
an acquaintance with the theory. The subject matter has evolved from a 
course given at the graduate level in the Department of Electrical Engineering 
at the University of Southern California. 

The mathematical background assumed of the reader includes concepts of 
elementary probability theory, the ability to use Fourier and Laplace trans- 
forms, and an understanding of the basic ideas of linear system theory. Famil- 
iarity with linear algebra is helpful but not essential. There is, in general, no 
substitute for a rigorous mathematical treatment: however, it is felt that the 
concepts and the important ideas to be presented may be obscured if too 
many mathematical details are included. Nevertheless, the book i? not a 
“cookbook’"; the definitions and theorems are carefully stated. 

The approach to and coverage of the material found here were heavily 
influenced by the author’s practical experience with problems encountered at 
the Jet Propulsion Laboratory concerning pointing accuracies of science 
instruments for various spacecraft. It is, therefore, hoped that the book will be 
useful to a large class of engineers and scientists working in the areas of guidance 
and control, communications, or other disciplines involving stochastic processes, 
estimation theory, and image enhancement. 

To make the book self-contained, the first chapter reviews the fundamental 
concepts of probability that are required to support the main topics. The 
appendices discuss the remaining mathematical background. Hie reader is 
advised to review the appropriate sections before attempting the problems at the 
end of each chapter. There are many examples scattered throughout the text, 
and the problems at the end of each chapter must be considered an integral 



part of the material. It is emphasized that the notation is generally indepen- 
dent from one chapter to the other. 

I wish to thank George Pace and Walter Havens for their encouragement. 
Thanks are due Michael Griffin and George Jaivin for their editorial comments. 
Finally, 1 wish to thank Professor Nasser Nahi for allowing me to teach the 
course, upon which this book is based, at the University of Southern California. 
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CHAPTER 1 

REVIEW OF PROBABILITY 


t.l INTRODUCTION 

The concept of probability is used in a wide variety of scientific fields, such 
as genetics, control, communication, econometrics, and many others. In what 
follows the fundamental concepts of probability are discussed. References 
(I ) - [10] were utilized in the composition of this chapter. 

1.2 SAMPLE SPACE, EVENTS, AND BASIC CONCEPTS 
OF PROBABILITY 

1.2.1 Sample Space 

Consider an experiment denoted by <£ By sample space, we mean the set of 
all outcomes of <?, which is denoted by S . The set S is also called the universal 
set. 

Example 1 

Let She the experiment of tossing a die and observing the number shown on 
top. The sample space S is given by: 

5= {1,23,4,5,6} 


1.2.: events 

An event A is a subset of 5, i.e., A is a set of some outcomes which are 
members of 5. Note that if A and B are events, so are A U B y A n /?, etc. 
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Definition 1 

Two events A and B are mutually exclusive if there is no way that they 
can occur simultaneously, Le., A D B = where ^ denotes the empty set. 


1.2.3 Basic Concepts of Probability 

Let S be a sample space associated with the experiment & With each event 
A we associate a real number denoted by /VO and define it as the probability 
of 4. The following conditions must be satisfied: 

(1) 0<P(A)< 1 

(2) P{S) = 1 

(3) lt\4 n = i, then 


P(A U B) • P(A) + P(B) 

(4) If A t , A 2 , * . . ♦ are mutually exclusive events, then 

p (u • + - 

Vi I- I f i- I 

1.2.4 Some Important Results 

The following conditions are true and are left as exercises: 

(1) no = o 

(2) /’l l ) = I - P{A ), where A is the complement of A 

(3) P[A U B) = P{A) + f\B) - P{A n B) 

1.3 CONDITIONAL PROBABILITY, TOTAL 
PROBABILITY, BAYES’ THEOREM, AND 
STATISTICAL INDEPENDENCE 

1.3.1 Conditional Probability 

Let A and B be two events. Then P{AB) is denoted as the probability of 
event A such that B has occurred and is defined as: 


PyA\B) = P{ ^ B) . A I\B)*Q 


ill) 
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1.3.2 Total Probability and Bayes’ Theorem 

Given a sample space S associated with the experiment and given events 
A t , A 2 A k , we say A , , A 2 , ■ ■ ■ . A k represents a partition if the follow- 

ing conditions are satisfied: 

(1) ,4, =</>, iff#/' 

ft 

(2) U $ 

i= i 


(3) ^,>>0. for all / = I....,* 


Now, let A and B be events. Then we can easily show that: 

P(B) = P(B\A { ) P[A , ) + P(B\A 2 ) P{A 2 ) + . . . + W A k ) P(A k ) 


k 

^mA^nA.) 


i= I 


( 12 ) 


The above result is called the theorem of total probability 

Utilizing the definition of conditional probability and taking advantage of 
iiq. (1.2), we now get: 


P(A. n B) 


P(B\A.) PiA f ) 

k 

Y,nB\A ( )P(A t ) 

1=1 


(K3) 


The above result is called Bayes* theorem. 

Example 2 

An electronic company producing transistor radios has three plants produc- 
ing 157r, 35%, and 50% of the entire output, respectively. Assume the prob- 
abilities that a radio produced by these plants is defective are 0.01, 0.05, and 
0.02, respectively. If a radio is chosen at random from the entire company, 
what is the probability that it is defective? 
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Solution 

Let 


B = {x (radio): x is defective} 

A. - {jt : x is chosen from plant i } 

Using Eq. (1.2) yields: 

3 

P(B) = £ I\B\A.)1\A.) = 0.01 X 0.15 + 0.05 X 0.35 + 0.02 X 0.5 = 0.029 


Example 3 

Assume a radio chosen at r andom is found to be defective. What is the 
probability that it comes from plant 2? 

Solution 

From Bayes' theorem given via Eq. (1.3), 


P(B\A 7 )1\A 2 ) 
f\A 2 \B) = - — 

'Em \A t )HA t ) 


0.0 5 X 0.35 
0.029 


0.b03 


1.3.3 Statistical Independence 

Two random events A and B arc independent if and only if 

P{A nB) = P[A)P{B > 

In what follows, we shall define random variables, and probability distribution 
and density functions. 


1.4. RANDOM VARIABLES AND PROBABILITY 
DISTRIBUTION AND DENSITY FUNCTIONS 

1.4.1 Random Variables 

Let f be an experiment and S be the corresponding sample space. Then a 
random variable is a real function X{ • ) from .S' into the set of real numbers, 
i.e.. for every £ € S> 3f(£) is real. 
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The choice of the term “random variable'' is not very appropriate because 
AX*) is a function, not a variable. However, we shall use the terminology in 
order to be consistent with the literature. In general, the random variables 
may be real or complex; however, unless specified otherwise, AX*) is assumed 
to be real. The random variable may be continuous or discrete. 


Example 4 

A fair coin is tossed three times. The sample space S is now considered to 
be: 


S = {HHH, HHT, HTH, HTT. THH. THT, TTH, TTT } 


where H denotes head and T denotes tail. Define AX*) “ number of heads. 
Thus, A(HHH) = 3, A(HHT) = 2, etc. The random variable so defined is 
discrete. 


1.4.2 Probability Distribution and Density Functions 

Let A(*) be a continuous (piecewise continuous) random variable. Then the 
distribution function corresponding to AX*) is defined as: 

F x (ot) = P{$ eS : AX£)<a} (1.4) 


where a is a real number. 

Before continuing the discussion, let us define the following notations: 

(1) \x<x] ± (tes. .V({K.r} 

(2) |*>jr) £ {$ eS: *(£>>*} 

(3) |a < X < b\ £ {{ e S: a < *(£ ) < b } 

Thus. F v (a) can now be written as: 

F' A (a) = P\X < a| (1.5) 

It is obvious that F..(a) is a nondecreasing function. 
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Let us single out those random variables such that there exists a function 
f x ( •) > 0, wheie 


F X U) - 



U6) 


The function /*(*) is called the probabilr density function (p.d.f.). If / v (jc) 
is continuous (piecewise continuous), utilizing the Fundamental Theorem of 
Calculus, we obtain: 


f x (x) = 


dF x (x ) 
Jx 


(1.7) 


/*(*) is sometimes defined via Eq. (1.7). 

It is also easy to verify the following properties: 


(1) P[a <X<b\ 


-f 


f Y (t) dt = F (b) - F v (a) 


C) F x (°°) 


■I 


f U)dt = 1 


(3) = 0 

(4) If f x (x) is continuous, then 


P\x<X<x + Ax] = i f x (t) dt = Ay f x M) 


where Ax > 0 and .v < £ < x + Ax (using the Mean Value Theorem of 
Integrals). 

(5) P\X>x\ = 1 - x] = 1 - F x (x) 

((>) If A^( # ) is discrete, then P(X.) > 0 and ^ P(X,) = 1 

i~ I 
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Let us now define F^ # ) for the case where -¥(•) is a discrete random 
variable: 


F x (*) = />(*«*] * E 

V s * 

Henceforth, we shall drop the subscript X from and/^f*) if there is no 
ambiguity about the random variable JT(* ). 

Some examples of common continuous distributions are *iven below. 

( 1 ) Uniform 


f x ( x ) 


F x (x) = 


1 

b- a 9 

0, 

0, 

x - a 
b- a 9 

< 1, 


a<x <b 
otherwise 

x <a 
a < b 
x>b 


(2) Gaussian or Normal 




/ x (ot) dot 


where m and o are parameters. 

(3) Rayleigh 


f x (x) • < 

( (AT/a 2 ) exp |-jf 2 /<2a 2 )]. 


x<G 

x>0 
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If there are two random variables X,(*) and X 2 (‘) with possible outcomes 
x, and x 2 , then a two-dimensional joint distribution function is defined as: 

Fjr iJrj <* ( .x 2 )£ JUT, and X 2 «x 2 ) (1.8) 


Similar to the one-dimensional case, the two-dimensional probability density 
function /x|X 2 (x,,x 2 ) is a function such that: 


a» | — 


0 9) 


whenever b 2 F/ f ax,dx 2 exists. It can be easily be shown that: 


f x i r*2 

r,x 2 J 


da. 


( 1 . 10 ) 


The following properties are true for joint distributions: 

0) F x x “> = •’ F x X 

“*12 "*12 

(2) F x x (x,,x 2 )is nondecreasing with respect to each argument 

(3) F x t xJ°°' x 2^ ~ ^x 2 ^ x 2^ ant * °°) " F x^ 

(4) f x x (x f ,x 2 )> 0, for all x, and x 2 

(5) // f x t x 2 ^ Q r a 2^ da i da 2 = 1 

The distribution and the probability density functions F Xl (x t ) and 
fx |( x j) are called marginal probability distribution and density functions 
(statistics), respectively, and that: 




d^,(x,) 

ax, 
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The marginal statistics F X2 (x 2 ) and /* 2 (* 2 ) aie defined in a similar manner. 


lei A and B be events such that: 

A* IJT, < a) and = |0, < JTj < 0 2 1 
Then from Eq. (1.1), 


F(A\B) = 


f(/t nfl) 



/*,*/*, ’ V^l <**2 


*2 

fx^ t )dx 2 


where P(B) is assumed to be ^ 0. 
Now. as 0 2 -+P l =p. 


F x (a\X 2 = P) = P[A\B) = 


[ 


^jr 1 Jr 2 ^ [ r® (fc i 

77W) 


The conditional p.d.f. fx ,(ol X 2 = 0) is given by: 

bF x (aUT 2 =0) 

f x y* 2 m » m — Va 

Utilizing Eq. (1.12) yields: 


f x x (a. 0) 
f x < alX 2 S 0) S - /x (0f- 


In a similar manner, we can show: 


f x (i3l«) = 


fx x <«• & 
*1*2 


fy <«> 


(III) 


( 1 . 12 ) 


(1.13) 


(1-14) 


(US) 
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By combining the last two equations. 


l x «!«>/, (a) 

V'”- ’/, w' ,l "" 


The last expression is called the Bayes* theorem for probability density func- 
tions and it is similar to the Bayes* theorem stated for the probability. 

The conditional density concepts can easily be extended to the vector case. 


1.5 FUNCTIONS OF RANDOM VARIABLES 

For the sake of simplicity we shall discuss the function of a single random 
variable and then extend it to multivariables. 

Let X ( •) be a random variable and let g(*) he a real valued function such 
that 


v = gix) 

and suppose F x (x) and /^(x) are given. Let us find F r (y) and f y iyY We shai 1 
give the results via the following theorem. 

Theorem I 

Let g(x) he a piecewise continuously differentiable function and that for 
every y there exists m points x, .x 2 , x such that 

0 • 

_>’=*(■**). A = 1,2 m 

and 

£(jr k )#0. A =1,2 m 


Then the following will hold: 


f Y (y) = 


W> , 

l/(x,)l + 1 


, W 


( 1 . 17 ) 
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The proof is not given here. However, the proof can be constructed as the 
generalization of the case where $(•) is one-to-one and g(x) > 0 V* x (or 
g(x)<0X For a proof, see references |1], (9), or |I0]. 


Example 5 

Let Jf and Y be random variables such that 

r = ajr + b 


where a and b are real constants. Assuming F x (x) and f x ( x ) are known, let us 
obtain F y (y) and / y (y). 

Solution 


F y (y) = F|F<^| = P[<rJT + b 




K ow fyi}’) can be obtained Via Eq. (1.17). Thus. 


f Y iy) = 



* v means 'Tor all.~ 
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Example 6 

Let X and Y be random variables such that: 

Y*AX)*X* 

Obtain F Y (y) and fy(y) assuming F x (x) and f x (x) are known. 

Solution 

Fyiy) = r[Y <y\ ■ <y\ - n -Vv < x <y/?\ = F x (y/y) - F x (-y/y) 

lf> > 0, f Y (v) can be calculated as: 

/*(*,) /*(*.) /^) 

f r iy) ~ lg(x,)l + lg'(x 2 )l " 12^)1 + |2(Vv)l 


Thus, 


vF)+/*(^)l • if>> > 0 


/y(y) = • 


(1 18 ) 


o. 


otherwise 


v 



which completes the problem. 
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Let X and Y be random variables wih the joint p.d.{.f xy (x,y) and let 


2 = «(*.>>) and w * h(x,y) 


be real and continuous differentiable functions. We can obtain f^^iz.w) in 
terms of f x y (x.v). For the sake of simplicity, let us assume that gfx.y) and 
h(x,y) are one-to-one functions. Then, it can be shown that: 

fx^x.y) 

f zw M = J y^ xv)| . assuming J(x,y) # 0 (1.19) . 


where x and y must be solved in terms of z and w, and J{x,y) is given by: 


dgfx.y) dgjx.y) 
dx by 

J(x,y) = (1.20) 

3h(x,y) bh (x,y) 
bx by 

If there are 

(x y (x m ,.y M ) 


ordered pairs such that 

z = g{x r y § ) and w = /i(* r v.), i = 1,2 m 


then Eq. (1.20) can be generalized by: 


” f x 

f 2* M = 


assuming J{x r y g ) * 0, for all i (1.21) 


The result can be extended to the general case, where we are dealing with an 
n-random vector X. 
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Let X . , 3f rt ) and Y = (K,, . . . , Y n ) be random vectors such that: 


Y = h(X) 


( 122 ) 


and, for the sake of simplicity, assume h is one-to-one, i.e., invertible. 

Let g be the inverse function given by: 

X = j*Y) = *<A(X)) (1.23) 

Let A and B be events such that B = [Y < y] and A = (X < £(y)]. 
Remember that the notation [Y < y] means {£ € S :>’,•(£) <y, for all i - 
1,2 m}. It is obvious that 


/yy) = F x (g(y)) 


since they both represent the same probability. Thus, 


/ y fgi y) 

/ Y <«) da = J f x m J0 


(124) 


The last integral is actually : 

.y, /.y 


/ ' I r y n 

J V-’’, 0 ’' 0 J y, • •<*’. ■ 


f x l = *t iy l V rW'. V 

J J '3 W- ^ 


(1.25) 

If we differentiate Eq. (1.24) or (1.25) integrals with respect to each com- 
ponent of y, we obtain : 


/ Y (y> =/ x ( ^y» 


by 


(1.26) 
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where bg{ y)/by is the determinant of the Jacobian: 


&g, 

as; ■ -m 


an, at. 


Equation (1.25) can also be rewritten as (assuming bg(y)/by =£ 0): 


/ Y (>) 



(127) 


where 


d/t, d/i, 

bx] bT 


J(\) = J(X V 


•v 


*K 

to. 



-I 


bh 

n 

bx 

n 


If /; is not a one-to-one function, the result can be extended in a manner 
similar to Eq. (1.21). 
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1.6 SOME USEFUL DEFINITIONS AND CONCEPTS 

Let X be a random variable and g(*) be a real function. Then the “expecta- 
tion” or th. “mean” of g(x) is defined as the Stieltjes integral: 

Eltfx)]=£ gix)dF x (x) (1.28) 

If the reader is not familiar with the Stieltjes integral, then Eq. (1-28), when 
F x (x) is differentiable, would reduce to: 


E \g(x)] = 



g(x)f x (x)dx 


(1.29) 


which is used in most engineering books. 

The “variance” of X is denoted as o x and is defined as: 

o 2 x = E(X- m) 2 


(1.30) 


where m = EX, and a x is called the “standard deviation.” It can be shown 
that: 


o 2 x = E[X 2 ) - m 2 (1.31) 

We shall also have the simple but useful inequalities: 


P[\X\>K] 

K n 


and 


P\ IJT- m\>Ko v \ < — 
K 2 


where K is a positive number and n is any integer such that £*[ Ufl"] < 
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If X(*) is a random vector, then 


X({) = (*,($>’ *2<$) *„<*» 


where £ £ S. The case where n = 2 and X(£) = (A,(£),A 2 (f)) = j , jc 2 ) = x t + 

/v 2 is defined as the complex random variable and it can be shown that: 


E\X l X a \K{E\X l \ p ) 1 ^ (E lA'j l*) 1 ^ (1.32) 


where p and q are greater than I and (1/p) + (\/q) = 1. The above equation is 
called the Holder inequality. 


For the special case, where p - q - 2, we get: 

flA’ 1 ^ 2 l<(£'IJr i l 2 ) ,/J (£■ ljr 2 1 2 ) 1 /2 (1.33) 


Equation (1.33) is called the Schwarz inequality and will be used often. 


1.6.1 Covariance and Correlation Coefficient 

Let m, and of be the mean and the variance of and let us define 

p i ^E[{X r m i )(X r m.)\ 

Then from the definition it is obvious that p #7 - of, and, for / =£/, we call ju i; 
the covariance of X ( and X f and p /; . defined by: 


as the correlation coefficient between X. and X It can be checked that - I < 
p, ; < I or, equivalently, ip^l < 1 
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The matrix A x is defined by: 


M 12 ... M lw ~ 

^21 *2 2 ••• *2 n 


A 


x 


(1 35) 




nl 


nl 


M 


nn 


nd is called the covariance matrix. Note that /i /; . = jUy ( .; thus A^ is a sym- 
metric matrix and, using the Schwarz inequality given by Eq. (1.33), we have: 


W l#i «7 |,/2 V /2 (1 - 36) 


which verifies I p |; 1 < 1 . If lA^i # 0 or, equivalently, the matrix A x has the 
rank w, we say A^. is nonsingular. 


1.6.2 Convergence 

Let X A . . . 9 X * . . . and X be random variables defined from S-*R. 
Then the set A ■ {£ : A n (£) -► 3f(f)} is an event (that is, A C 5). Thus the 
probability that X n converges to X is defined. 

There are several criteria of convergence. The following modes are defined 
for both real and complex valued random variables as n 

(1) X n converges in probability (or /^-measure) to X, if for any given e > 
0,P(\X n - AM> e) -► 0 (or )imP(\X n - X\> e) = 0 as n - ~). 

(2) X n converges in quadratic mean or mean sauare (m.s.) to X if E(\X n - 
X\ 2 ) “**0. 

(3) X n converges with probability one or “almost everywhere" to X if 
P(X n -* X) » 1 , or. equivalently, P(X n * X) = 0. 
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1.7. NORMAL DISTRIBUTIONS AND CHARACTERISTIC 
EQUATIONS 

The most important distribution is the normal distiibution. The normal 
p.d.f. f x (x) is defined as: 

where X is a random variable (one-dimensional). 

The error function erf(x) is defined as: 



( 1 . 38 ) 


It can be easily verified that: 



Note that if we take the derivative of F(x) we get /(*), i.e., 

^[4(^)5 (sm) 

■ vsb- 

as asserted. 

Note that in the above equation we have used the Fundamental Theorem 
of Calculus, which states: If 

/ h 2 ix) 

giy)dy 
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where /t f and h 2 are differentiable and g is continuous, then 


JG{x) ^ , i n , kk 

—JT ~* h i <jr)) ~dT 


dh x 

~Jx 


From the above equation .e get: 


w 


) = 


[ *, - m .v - 

-%-J' erf L a“J 


It can be verified that for the normal distribution ihe p.d.f. is symmetric 
about the mean m and 


n = odd 

CA-Do 2 *, n = 2k (oven) 

Also, it can be shown tnat if A', and X 2 are independent normal random 
variables, with respective I and (m 2 ,o 2 ), then their sum X =.r, + .r, is 

also normal with mean m = m , + m 2 and variance a 2 = oj + Thus, the 
summation of independent nmmal random variables produces a new' normal 
random variable. However, the “Central Limit Theorem” states (under fairly 
wide conditions) that the sum of a large number of independent random 
variables is approximately normally distrib sted. even though each ind»vidual 
random variable may not be normal. 


/:’ { I A* I M } 


J 


0, 


1.7.1 The Vector Case 

Let X = (A' j .A' 2 X )'. where T is the transpose, be a normally 

distiihuteu random vector: thus. 


/y(X,.X, 


,X_ ) = 


(2 jr)”/ J VTAI 


~ c.\p m >' A _, (\- in) | 


(1.41) 
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where A is the covariance of X, ie. t A * £((x - n)(x - m) 1 ], I Al is the 
determinant of A, and 


m - 


m. 


m 


= £TX> 


It can be shown that A * *n also be written as 


A = £(XX T ) - mm 1 

Notationally we can write / X (x) = G(xjnAK which means the Gaussian 
density of X has the mean m and the covariance A. 

In order to derive some important properties in the normal random vectors, 
we need some basic definitions. 


1.8. THE CHARACTERISTIC FUNCTION 

Recalling from the one-dimensional random variable, let X be a (one- 
dimensional) random variable. Then the characteristic function of X is defined 
as: 


C(«) = £|exp {juX)\ 



exp (jux) f x ix) dx 


0 - 42 ) 


It is seen that the characteristic function is the Fourier transform of f x (x ); 
however, the positive sign in the exponent simply means that we must use the 
negative sign in finding the inverse. Thus, the density function f x (x) can be 
obtained from (using the Fourier transform pair): 



C(m) exp {-jux) du 


( 1 . 43 ) 
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For a discussion of the Fourier transforms, see Appendix C. 


It can be shown that 


using Eq. (1.42) where 




tw-Iif*, 




and making use of 


% = J x k f x (x)dx 


m 


. ,- 1 ? 

du k 


11 = 0 


The most useful property of the characteristic function is that it relates the 
sum of independent random variables. It is also used to simplify calculations. 


1.9. DEFINITION EXTENDED TO RANDOM VECTORS 

The characteristic function of a random variable X = (X v ..,,X n ) T is 
defined as: 


C(u) = C(u r . . . = £r|exp f/u T X)] 


= jj fexp (/u T x)/ x (x l x J dx , <**2 


dx 


(1.44) 


Now let us apply the definition given by Eq. (1.44) to the Gaussian random 
vector X = {X x , . . . , X n )* and make the following claim: 
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Theorem 2 

The characteristic function of the random vector X is given by: 

C(u) = exp j>ti T ra “ \ u T A uj 

Proof 

Left as an exercise. 

Theorem 3 

If two normal vectors X and Y are Gaussian with respective means (vec- 
tors) m x and my and are also uncorrelated, then they are statistically inde- 
pendent. 

Proof 

Let X be n -dimensional and Y be m -dimensional with respective covariances 
A x and Ay. 

Define a vector 
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Define A xy (cross-covariance): 


A XY -£[(X- m^Y- m Y ) T ) 


Before proving the assertion, observe that aJ y * ZjfY - m Y XX - m x ) T ) = 
Ay X - Let us now calculate A £ : 


^ * E\Z - m z XZ - n» z | T J = E 
Then 



/ XY ( x <Y) = /(*r 




exp 





If X and Y are uncorrelated, tiien A xy = 0: hence 



a xy 

a r a x . a xy1 

1 

> 

X 

o 

a yx 

: a y 

" Kx : a yJ 

0 ; a y 


* (det A x Mdet A y ) 


This substituted in /(x,y) yields: /(x,y) - /* x (x)/ y <y )• Done! 
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Theorem 4 


If X and Y are specified as in Theorem 3, then we daim: 

£*XIY) = £\(Jf *,) ! (r, r w )) * m* ♦ \ Y *i‘(Y - « Y ) 

and the conditional covariance matrix A^jy is defined by: 

A X1V « {fc(X - ffXlYNX - £(XIY) t } = \ x - A xv A y ’ \ x 
T he proof is ample but lengthy and has been omitted. 
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EXBtCISES 

1.1 An m contains 4 green and 6 Hue marbles. Two marHes are drawn out 
together. One of them is tested and found to be Hue. Find the prob- 
ability that the other one is also blue. 

IJ Let A and B be independent events associated with an experiment. If 
the probability that A or B occurs is 0.7, while the probability of 
occ ur rence of A is 03, determine the probability of occurrence of B. 

13 Three dice are thrown. Find the probabilities of the events of obtaining 
the sum of 10, 11, and 12 points. 

1.4 A continuous random variable X has the distribution function: 

( 1 - (1 ♦ ax) exp (- ax ), if x > 0 

0, if x < 0 

(a) Find the characteristic function. 

(b) Find the mean and the standard deviation. 

1.5 Let the joint probability density Fraction of the random vector (X,Y) be 
given by: 

I *y exp l (x 2 +y 2 )/2] , 

0 , 

(a) Find f x (x), f y {y), f\x ly), and fiy lx). 

(b) Are the random variables X and Y independent? 

1.6 In the previous problem, if in addition we have the random variables Z 
and W given by: 

(a) Z-aX + 6K, W*cX + dY 

(b) Z= YX 1 (AX), W = XY 2 (AY) 

where £^*) is a unit step function, find f zw U, tv). 

1.7 Finu the probability density functions of Z and W. Given: 

f x Y (x,y) » 2 exp - \x 2 + 2xy + 2 y 2 ] 
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if jt andy >0 
otherwise 



(a) Determine the mean and the variance of the random variable Z = XY. 

(b) Determine the mean and the variance of the random variable W = X 2 + 
Y. 

1.8 If X and Y are independent random variables such that: 


fM) 


1 1 

IT (i_ ,2)1/2 


. if bel < 1 


otherwise 


/ r W^e*p l-y 2 /2k 2 \ (Ay) 
k 2 


where lAy) is a unit step function. Show that the random variable W = 
XY is normal with mean zero and variance k 2 . 

1.9 If in a vector case of a normal random vector, n = 2, m % = m 2 = 0 and 
M| | = * 1, show that: 


f(x r x r p) = 


2n(l - p 2 ) ,/2 


exp 


x] +JT 2 - lpx x x 2 

2(1 - p 2 ) 


where p = p 2 , = p, 2 . 


1.10 

(a) If A is an nt X n n nx such that 






V 






*2 


z- 

■ 


X- 






LINEAR OPERATOR 





A 
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show that if X is normal so is Z. Use the property of characteristic 
equations given by Theorem 2. That is, show that the characteristic of 
Z is: 



(b) Show that A x = A A^ T . 
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CHAPTER 2 

STOCHASTIC PROCESSES 

2.1 INTRODUCTION 

Very often we are interested in observations that are made over a period of 
time and that are affected by random chance. This situation is termed a 
stochastic process and is defined below. 


2.2 DEFINITIONS AND EXAMPLES 

Definition 1 

A stochastic process X(t y <jj) is a function of two variables, where to is an 
element of the sample space and t is a parameter (time) which belongs to a 
set T (time interval). 

Definition 2 

For every co Q € S (sample space), the function X(t y w Q ) is called a sample 
function of the process. 

The process X(t y u>) y in general, can be complex, but, without any loss of 
generality, we shall discuss X(t y to) when it is real. Thus, to each sample point 
co €5 (sample space), we are assigning a waveform X , which is the function 
of t (time) such that: 


X f . co -*■ X{t y co) 

Hence, each sample space will have a collection of waveforms, each assigned 
to a member c o€S The collection of all of these waveforms (as many as the 


29 



cardinality of 5) is called an ensemble. Thus, each individual member of the 
ensemble is a sample function. 


Example 1 

Assume that we toss a coin twice in succession. Then, our sample space S 
is the collection of four outcomes: 


s = {hh ,tt ,ht ,th| 


CO 


I 


<o 2 <o 3 CO 


There exists four sample points <Oj , co , and co 4 - The probability of each 
occurrence is 1/4 (the coin is a fair one). 

Let us now define a function Jf f (‘) : S-* R such that: 


X f (LJ k ) = X(h co fc ) = sin kt 


Thus, the ensemble consists of four elements (as many as the cardinality of S, 
which is 4). Let us denote the ensemble by<£ Thus, 


<?= {sin r, sin 2f, sin 3/, sin 4/} 
and the probability assigned to each waveform is also 1/4. 

Remark /. The cardinality (number of sample points) corresponding to the 
sample space S may be finite, numberably infinite or dense. 


Remark 2, For a stochastic process 3f f (co) or X(tfr>) is an appropriate 
designation. However, in common practice the process is represented by Af(/), 
which actually means 


2.2.1 More Words About X(f) 

The notation of 3f(r,co) may be better understood by the physical phe- 
nomenon. Consider a system such as a radar antenna receiver. Suppose the 
noise signal at the output is of interest. Each time we turn on the system, it 
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will yield a different noise waveform. The collection of all of the noise wave- 
forms is the ensemble of this process (see figure below). 



It is important to mention that each sample function (waveform) is 
assigned to a single point Thus, after u; is specified, the waveform is 

deterministic (not random). The randomness is associated with each sample 
being chosen (occurrence of a sample). 

Example 2 

Suppose a receiver (antenna) detects signals of the form: 

X(t) * a cos (cor + @) 

where a (amplitude) and 0 are both random. Suppose by :ionie sort of prac- 
tical experience we know the distribution functions of 0 and a (for example, 
0 or a could be Poisson, Gaussian, uniform, or any other probability density 
function). 

Let us assume a is Gaussian and 0 is uniform over the open interval (0,2 tt] , 
Then, 


/,<«- 


L 

Oyflli 


exp 
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and 


/■«(«)= 



0 € |0.2rr) 


elsewhere 


Corresponding to each sample function, a and 0 are assumed to be constant, 
but they definitely vary from one sample function to the other. 



fO 


Example 3 

Consider 


X(t) = ar + b 

where a is a random variable, and b is a constant. 

Remark 3 . For the one-dimer, ionai case XU.co) becomes a random 
variable for each fped / = t x since Af(f,,co) becomes a function of to only, 
i.e.. 


X(t ] . to): S-+R 
which is the definition of the random variable. 
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Remark 4 . Remember that we use the notation X(t , co)(or A^to)) by 
either X t or *(/,). 


2.3 FIRST-ORDER STATISTICS 

The distribution of a real process A(f) for a fixed f = is defined: 

) = />{*(/, )<x} (2.1) 


Remember: {AT(f f ) < x} - (to E S: A"(/ 1 , to) < 

Definition 3 

The first-order statistics are those items of information that can be com- 
pletely determined from F x (x , f), such as f x (x, t ), m{t) - EX(t) or E[X(t)\ 2 , 

a xur etc - 

Definition 4 

A nonnegative function f x (x , t) > 0, such that 

F x (x. t) = f f x {x,t)dx (2.2) 


;s called the probability density function (p.d.f.). If F x (x\t) is differentiable, 
then, from Eq. (2.2): 


bFM.t) 


Note that condition (2.2) is a weaker condition than that of (2.3), because 
/(x,f) may exist even though F^x,/) may not be differentiable. 

Note that: 


£■[*(0) =J xf x (x.t)dx 
will be denoted as either m{t) or t/(/) in what follows. 
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Example 4 

Let us continue example 3, Jf(/) = a/ + b, where t > 0, b is a constant, and 
a is a Gaussian random variable: 


fJM = 




(2.4) 


Find the first-order p.d.f. f x (x.:). 

Solution 

From Jf(/) = a Mb, we get a = (1 It) (X - b). We know: 


|(/x 1 

d a 


Now dx/t/a = f; since f > 0, we have 


t lx | 
dot | 


t 


and 

a =y(.x - 6) 

Hence, 



Important Reminder. From now on, we shall drop the subscript X from 
F x (x, /) and f x (xj) whenever it is appropriate. 

Example 5 

Obtain the mean and the variance of AV). 
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Solution 


EX[t) ■ m(t) * tt\a) + E[b) * / • 0 ♦ b * b 

Note: From Eq. (2.4) it is obvious that £(«) * 0, o* * 1 * £(a 2 ). 
Since 


then we must calculate £IAT 2 (/)J : 

£lJf(/) 2 ) =Fl(af + b) 2 ] =F(/V + b 2 + 2/ah] 


= /^a 2 ) + b 2 + 2/b£(a) = r 2 (l ) + b 2 = t 2 + b 2 


Hence, 


= (/ 2 + b 2 ) - E*{X) = (f 2 + b 2 ) -b 2 = t 2 

Remark 5 Regardless of the parameter f , the mean of Jf(f) is b\ however, 
both E(X 2 (t)) and are dependent on /. 

Example 6 

Consider the random process A(/) given by: 

*(/) = A cos (a> 0 f + 0) 


where 0 is a random variable which is uniformly distributed over |0,2 tt) and 
the amplitude A is constant. 

Obtain the following first-order statistics: 

(a) Probability density function 

(b) m(t) 

(c) The variance of X 
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Solution 


(a) We can consider the sample function x to be 
x * A cos (cj Q f + 0) 

where x and 0 denote the parameters (posable values of a random 
variable X and 0, respectively). Since 0 is uniformly distributed, we 
get: 


iji. «€|<«.| 

0, otherwise 


The probability density function f x {: c,f) can be obtained as follows: 


f x (x, /) = 


a /a 'k I "t* | v 


<**(«,) 

+ * 


<10 


</0 


because there are two values of 0 € [0,2*1 such that x = j 4 cos (w Q / + 
0), one value of 0 is obtained where 0 < c o Q t + 0 < * and the other 
is obtained where tr < <o Q f + 0 < 2*. 


Now 


- - A sin (cj 0 / + 0) = - cos 2 (c o Q t + 0) 


= -\^A 2 - .x 2 , for 0 < co Q f + 0 < * and Ixl < <4 


and 


</0 


0=0 


i 


d 0 


0=0 


2 
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WT& f— *rwif 


3 — ~ r. . for Ixl < A 

* y/A 2 - x 2 


*vA 2 - or 2 


, I Jr I <.4 


otherwise 


(b) m(r) * £pT(0] = /4 /: cos (tJ 0 r + O)f(0)d$ 


A I cos (co o r + 0) — </0 = 0 


Alternatively, 


«(r) = £|*(r)| = / x/(x,r)rfx 


^ A 


I x ; dx = 0 

'-A ff(A 2 -x 2 )'/ 2 


x 2 (0) /w 

■4 2 cos 2 (w / + 0) r— do = 
0 2 x 


/i 2 r 2 * 

tf " 

~'n 


+ cos 2 (w 0 r + 0)J </0 = ~ (2n) = 


«^=£(* 2 (/))- £ 2 XO)*4>-- 0 = 4 - 
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Remember that: 



2.4 SECOND AND HIGHER ORDER STATISTICS 

For any arbitrary set of / values / 1? / 2 t n and random variables Ai^) 

= X f , . . . , AX/^) » X n , we define the /t<Limensional joint distribution as: 

F{x x , x r . f , . l 2 t n ) = P{X t < x , X a < *, } 

and the p.d.f. / (x ( . . . ,x )l .t ) f n ) is a function such that: 

(I) /(x, x a .t f ,t n )>0, for all x = fx ( x^V and 


(2) f(x t x r r t t n ) = 

X \ X n 

j f f* I V'l 'n )dx r dX n 

— oo 

Again, if F has a partial derivative with respect to x t then: 


/<*! V'l* 


‘ dx,3x 2 ...ar B 
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2.4.1 Autocorrelation; Covariance 

The correlation between two waveforms from the same ensemble gives 
some useful information about the waveform. The first-order statistics do not 
yield all the information about the random process, since the first-order p.d.f. 
cannot indicate the dependence of the random process (signal) at two differ- 
ent times (remember that £(f t ) and Af(r 2 ) are two different random vari- 
ables). Thus, it would be advantageous to obtain a measure of relating the 
process Jfffj ) to 

For the real process Jf(r), the autocorrelation function f 2 ) is 

defined as: 


R x (f r 




fh 


x 2 /(x t ,x 2 ,t r t 2 )dx l dx 2 (2.5) 


and it can easily be seen that it is a function of r ( and t % . 

The corresponding covariance (autocovariance) of X(t) is defined as: 

C Jf (r 1 ,/ J ) = £{lJf(r 1 )- m,) [*(/,)- \ (2.6) 


Note that: 

C jr (r,,/ J )*£{w,)Jf(/ 2 )}- m l m 2 = R x U t .t 2 )- i 2 
Thus, from (2.6), it is obvious that if = t 2 = /, then: 

C x (t, t) = o 2 x(f} 


More Definitions 

If X{() and K(f) are two processes that (one or both) could be complex,' 
then Eqs. (2.5) and (2.6) are generalized as follows: 


/? Y ( V/ 2 ) = £||W,)A‘(/ 2 >)| 


(2.7) 


C^(/,.r 2 ) = £{|^(/,) - w,] |*V 2 )- »»;j} 
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( 2 . 8 ) 


' 2 ) _ m i m t 

where denotes the complex conjugate. 

The cross-correlation between XU) and Y(t ) is defined as: 


*jr A ’'a> 


( 2 . 9 ) 


and its corresponding cross-covariance as: 


c * r<' r '2 ) Mf {[*('.)' "•*) ir’t'j)- 


^XY^ f t ,f 2 ) " m x m Y 


( 2 . 10 ) 


It is obvious that the nth order p.d.f. contains all the information about the 
first (n - 1) p.d.f. For example, we shall illustrate this point by the second- 
order p.d.f. Let ffa v be given, then: 


f(x i ,x 2 ,t l ,t 2 )~f(x r l l )f{x 2 ,t 2 I X 1 .f | ) 


We know that 




f fa | " ^1 * ^2 ^ 


and the conditional p.d.f. can be obtained as the ratio of /(jr f ,jr 2 , t } , * 2 ) 
over /(.Vj J t ). 

The correlation coefficient between X{t x ) and AV 2 ) is defined as: 

C x U v t 2 ) 


P|2 * °x °x 

" I *2 


(2.11) 


as expected. 
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2.5 STATIONARY PROCESSES 

Definition 5 


A stochastic process X(t) is said to be strictly stationary if the entire 
family of its finite-dimensional distributions are invariant under a translation 
in f. That is, for given r, , f 2 » . . . , f time points, the distribution of X(t t + 
r), X(t 2 + t), . . . , X(t n + r) (for ^(*) real or complex) is independent of r. 


F{x r x 2> . . . y x n ,t , t n ) * x #j ,f 1 + r + r) 


(212) 

for all n. Thus, we need to check Eq. (2.12) for all finite n. For rt * 1, since 
F{ x, t ) = * + t) or /(x, r) = /(jc, r + r) (if F is differentiable), then: 

EX(t) = £*(r + t), for all r (2.1 3) 

which implies EX{t) must be constant. For example, let r = -f, since EX(t) = 
£jr(f + r) = EX(t - r) = £*(0) = constant (that is, EX{t) = £*(0) for all t as 
well). 

Conclusion 1 

For a strictly stationary process EX(t) is constant and is independent of 
time f. 

Now if /^^(r, , f 2 ) exists for all f, and / 2 , then by definition of 

R x (t r t 2 ) = E\X(t t ) x\t 2 )} = E[X(t % + r) x\t 2 + r)] (2.14) 

Equation (2.14) is true for any t %9 t 2 and r. For the special case where 
r = -f r then ^(f, , * 2 ) in Eq. (2.14) becomes: 

R x (t v t 2 ) = E\X{t x )X\t 2 )\ 

= E[X{t x +r)X\t 2 +r)] 


= £Wr 2 - t } )X'm (2>15) 
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Thus, we have shown that R x (t t , t 2 ) is a function of time difference t 2 - f, 
(for the strictly stationary case). 

Conclusion 2 

It turns out that for the strictly stationary case we have R x U t ,t 2 ) as a 
function of the time difference t 2 * /, . From now on, when this condition 
prevails, we shall write /?*(*, * t 2 ) as R(t 2 - f, ). 

Conclusion 3 

For strictly stationary processes, we have: 


EX(t ) = constant = m (2.16a) 

EX(t t )X\t 2 ) = R(t 2 - tj (2.16b) 


The condition given by Eq. (2.16) is a consequence of a strictly stationary 
property (a necessary condition). In a strictly stationary process, we must 
have at our disposal all of the joint distribution functions for k = 1 , .... n 
(finite n) and, in addition, they must satisfy: 

Ffyc j, > . . * x ^ * • • • » ~ F\x p • • • t ^ . . . , ^ ^ t) 


for all k - and ail r. 


The above condition is very stringent. It turns out that very often the 
second-order statistics are sufficient to characterize many physical situations, 
which leads us to define some important terms. 

Definition 6 

The process X(t) is stationary in the wide sense, if conditions (2.16a) and 
(2.16b) are satisfied. 


2.5.1 Some Important Properties for the Wide-Sense Stationary 
Process X(t) 

(1) R(t 2 - r,) = R*(t t ~ t 2 ) or, equivalently, /?(f) = /?*(-f),since R(t 2 - 
fj) = E{X(t t )X\t 2 )) = El(X(t 2 ) X m (t x )) * = R\t t - t 2 ). 

(2) Since E\X(t)\ 2 = £[*(0 Jf*(r)] = R( 0), then, o 2 X(t} = fl(0) - m 2 , 
which is independent of time t. 
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(3) From the Cauchy-Schwarz inequality: 


£( !*(*,)*>,)! 2 1 < E\ I X(t t )l 2 ] £[Uf(f 2 )l 2 ] — !/?(/)! <R( 0), 
for all t 


Example 7 

A quantized process has associated sample functions, where each sample 
function consists of sequences of pulses of unit width. 

The pulse amplitudes take the binary numbers +1 and -1 with equal prob- 
ability. The successive amplitudes are independent Assume that the starting 
point of each sample function is random and uniformly distributed over a unit 
interval (denote the starting time as 0). Find the correlation function of ^(f). 



fie) 


♦ 







(c) 


r« - 1 


* o 


Solution 

The random processes have discrete values of +1 and - 1. Let AX*,) = i and 
.V(f 2 ) = /, where i and / could be +1 or - 1. Then, 


R(t r t 2 ) = ElX« l )X(t 2 )) = EE x x 2 Kfj ) 

i / 
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where 


Hf.h = P{X{t x ) = i and *(f 2 ) = /} 

+ (-l)(-l)^-lrl) + (-l)(l>« -1.1) (2.17) 


Now if we obtain /*(/,/) for / and / corresponding to +1 or - 1, we will be 
done. These probabilities are obtained as follows: 

Al.l)-n*(f 2 )- 1 I *(/,)= U W,)= 11 

For a sample function, let 0 be the starting point of the pulse in which f , occurs 
(uniformly distributed, see part (c) of the above figure). Now t 2 either takes 
place during the same pulse as /, < t 2 (case 1) or during another pulse; we 
now write: 


P\X« 2 )= 1 I *(*,) = ! 1 =P\t 2 <0 + lJ +\PU 2 >0+U 


(The 1/2 is used because outside the pulse, given AXf,) = 1, it is equally likely 
that X(t 2 1 he either +1 o» - 

Now can be written as. 

1/2 

J*i,l) = < 0 + 11 + \p\( 2 > 0 + ilf/W^f 

= \{p[t 2 <e + ij +^(t 2 >0 + u| 

i {»- ('2 - f ,) + i < f 2 " f .>} ' if 

i[o + i], if I 
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Note that 

PUj <0 + \] *P[Q>t t - 1) * 1 - P[$ < t 2 - 1] = l - F(t 2 - 1) 
and remember that F{t ) * t - (r, - 1) = / - r ( +1 

W.t 2 - 1)= f 2 - 1 - t t + 1 « fj - / ( for the case - r, < 1. 
Because of symmetry, P(\,\) - P(- 1,-1). In a similar manner, we will find: 


-(r 2 -/,). if 


* 1 ,- 1 )-*- 1 . 1 )- 


i 

,4 ’ 


if f 2 - r, > I 


Now, for t = t 2 - r, (/, could be larger than / 2 ), the general case R x (t) 
can be found (see Eq. 2.17): 


R x (r) 


! 1 - Irl.if Irl < I 
0, if Irl > 1 





Henceforth, througfiout the text, unlesf specified otherwise, by the sta- 
tionarity of a process X(t) we mean stationarity in the wide sense. 
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Definition 7 


Two processes X(t) and Y(t) are uncorrelated if, given any t x and t 2> we 
have: 


E[X(t x ) Y'(t 2 )) =m x U x )my(t 2 ) (2.18) 


ns a consequence of condition (2.18), we have: 


C XY « t >t 2 ) = E{[X(t x ) m x (t x )) [Y'(t 2 )~ m*(t 2 )] 


= E[X(t x )Y'(t 2 )]- m x (t x )m* Y (t 2 ) 


= m x (t x )m* Y (t 2 )- m x (t x ) m' Y (t 2 ) = 0 


Definition 8 

If riA^fj) T*(/ 2 )] s 0’ then we say X(t) and T(r) are orthogonal. 

Note that C^yUj,^) = 0 implies that [A^(/ ( ) - m^.^)] and [T(f 2 ) * 
m y (t 2 )J are orthogonal processes. 


2.6 CONTINUITY AND DIFFERENTIABILITY 

The continuity of the process X(t) with respect to t is restrictive. However, 
the continuity in the quadratic mean (mean square) is not as restrictive. We 
say the process A\f) is continuous at t = t Q in the quadratic mean (q.m.) if 
E\ I A"(f 0 )l 2 ] ‘'xi'ts for t = f Q , and 

lim IA(r o ) - Jf(/ o + e) 1 2 } = 0, for every e (2.19) 

e-0 

If condition (2.19) holds for every t € \a , b ], then we say ,¥(/) is continuous 
in the quadratic mean (mean square) in [a, b]. If condition (2.19) holds for 
t € (-<*v»), we say X(t) is continuous (in the q.m.) everywhere. 

It is left as an exercise to verify the following claims. 

Claim /. X(t) is continuous in the q.m. at t = / Q . if and only if the 
covariance /?(;,,/,) is continuous at every f. = /, = (. (diagonal point or 
element). 
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Note: In order to prove the above claim, we need to verify the important 
relationship: 


£( I X(t + e) - *(/)! 2 1 * R(t + e, t + e) - R(t, t + e) 

- R(t + e, 0 + R(( . 0 (2.20) 

The continuity in the q.m. is much weaker than the sample continuity. A 
classical counter example is the Poisson process: 

W) = k) = ^ exp [-\0 


where is a staircase type and, therefore, discontinuous; however, R(t {y t z ) 
= X min (f |# f 2 ), for all t % and f 2 , is continuous, which implies X(t) is 
continuous in the q.m. even though Af(/) is not continuous as a sample 
function. 

If V(r) satisfies: 


lim Zf|J 
£-0 


X(t + e) - XU) 
e 



( 2.21 1 


We say X*(t) is the derivative of X{t) in the q.m. and we write: 


X(t + e)~ X(r) 
€ 


q.m. 
€ -*Q 


X\t) 


We can verify that (use Eq. 2 20): 


A(/ + e.)- X(t) X (/ + ej- X (r) 
E ! 1 

L e i e 2 J 


« 


R(t + e r / + e 2 ) - R(t + e, , /) - /?(/. t + e 2 ) + /?(/, t) 


e. 


( 2 . 22 ) 
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Claim 2. The derivative X'O) of A\ ') exists in the q.m. if and only if 


d 2 *(/,. t 2 ) 

* 7 7**7~ 


exists and is finite for - t 2 = t (see Eq. 2.22) because, as e, and e 2 -► 0. 
Eq. (2.22) becomes the s: ond partial for - t 2 = t. Thus, the autocorrela- 
tion of X*(t) is given by: 


d 2 R xx U r t ) 


' 1 2 


(2.23) 


By direct calculation, it can also be shown that: 


d* (/ .# ) 

;oi = ~it - 


(2.24) 


0/J (/ .f ) 

R x x u f 2 ) ± F.\x </, ) A’ (/ 2 )1 = - ' d \ -- 


(2.25) 


3 R xx (t, • 1 2 ) 


/? A . v .(/,.f 2 )^£-{AT'(/,)Ar*’(f 2 )J = — — (22h) 


If X(t) is stationary, and utilizing r =• well as Eqs. (2.2*+) (2.2b), 

we get: 


/ V* ,T) = ‘ 


t/ 2 K vv (T) 

Jr 2 


From which: 


, <f 2 fl vv (0) 

/e v , v .(0) = a 1 1 a"(0)I 2 1 =- — 

XX Jt 2 
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(2.27) 


(2.28) 



2.7 ERGODICITY AND STOCHASTIC INTEGRALS 

In order to obtain the complete statistics of a process, the ensemble of 
sample functions is needed. Loosely speaking, a process is called ergodic if the 
complete statistics can be determined from any of the sample functions in the 
ensemble. Thus, a single member of the ensemble is assumed to represent the 
entire ensemble. Before giving a basic definition of ergodicity, the concept of 
stochastic integration is needed. Thus, we shall talk about the stochastic inte- 
grals. 


24 STOCHASTIC INTEGRALS IN QUADRATIC MEAN 

For the great majority of applications, we do not need the most general 
form of the stochastic integrals. Thus, we shall only consider two cases of 
integrals: Reimar.n integrals of the form: 


A 


i 



g(f) 


(2.29) 


and Stieltjes integrals of the form: 


A 2 - f g{t)dX(t) (2.30) 

where [a y b\ is the dosed interval and is finite, gU) is a deterministic func- 
tion, and 3f(r) is a random process. For the sake of simplicity, assume EX[t) = 
0 = mit). Thus, 


R x (t, u) = C X U, u) 

Suppose / = [d, b] is finite, and let the points a , a 2 ' • • ■ * a m + i define 3 
partition, that is: 


fl = a, <a 2 ..<a m + 1 - b 

Let Sj and S 2 denote the sums corresponding to and A v respectively: 

m 

s \ -£<<y *&»/**/♦," v (2.3i) 
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(2.32) 


s 2 * ( v> 

/=! 

Since and $ 2 are sunmations of random variables, S f and S 2 are also 
random variables with £V>|) = EtS t ) = 0 (because EX(t) » 0 by assumption 
for all r). 

Now as m and the maximum of (o^ +1 - o^) -*0, the limits of 5 % and 
S 2 exist (in the quadratic mean), that is, 

qsn. 

A % = lim S % (2 33) 

qm. 

A 2 = lim S 2 (2.34) 

where 

m 00 and max (or^ +| - cr) 0 (2.35) 


Remark 6. From the above, we mean: 

limflM, - S,l 2 ) =0 
and 

lim E\ \A 2 - S 2 I 2 J = 0 
whenever condition (2.35) is satisfied. 

Claim 3. It can be verified easily that if R(i 9 u) is continuous over \a , b] X 
\a, b) , and if £tr) is such that the Rcimann integral: 


-a: 


g(t)g (m) /?(/ , m) dt du 


(2.36) 


SO 



exists, then the integral A ( exists in the quadratic mean (q jn.) and 

E 1,4,1* = W t andf^,)-!) (237) 

Remember that E(A ( ) * 0 (this was shown above). 

Claim ¥. Also, if /?(f, u) is of bounded variation (l/?(f, it) I has finite 
number of maxim urns and minimums over (a, b] X [& b ] ), and if g{t ) is such 
that the Stieltjes integral: 





g(t)g\t)dR(t,u) 


(2J8) 


exists, then A 2 exists and 

E\Aj* * W 2 and£(l»' 2 )=0 (239) 

To prove (2.37) and (239), we consider another partition of {a, b] : 

" = M I <U 2 ■<“«>. =b 

and we let 5^ and S* 2 represent the sums corresponding to (231) and (232); 
then we can show (by utilizing the definitions) that: 


*(Vi') 



g(r) £*(f) R(t.u) dtdu 


(2.40) 


where 


m -► 00 and max (a. + f - a f ) -* 0 and max ( w ^ } - u.) 0 (2.41 ) 


Similarly, 


£(s 2 s' 2 ') 



g{f ) £*(mW/?(mO 


as condition (2.40) is satisfied. 


(2.42) 
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Remark 7. We have assumed that 5, and converge in the qjn. It is 
easily shown that the limit in each case will be independent of the particular 
partition chosen. 


Remark 8. If either S, or S 2 converges as a and b •* «•, then the 
limiting integrals are defined accordingly. 

Remark 9. Since 


'•-r 


A* I gU)X(t)dt 


then. 


E\A l \ 1 ^E[A % A\] =£ gU)X(t)Jtj' g>)Jf*(n)duj 




(u) Jf(r) X (u)dtdu 


] 


If we let the “expected value” E operate on the integrand, we would get the 
result given by (2.40). However, we can only do this if the appropriate condi- 
tions are satisfied. 


Example 8 

Let g(t) = 1 and 3((/) be a continuous real process on (a, b ] ; define: 


Q = 



X(t)dt 


Find the mean and the variance of q . It is easy to show that the conditions of 
claim 3 are satisfied (m(f) may not be zero, which was assumed for con- 
venience in claim 3). 
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Solution 


Eq 


■'[fH'X* £(Jf(r)) dt m(t) dt 


Now we need to calculate E(q 2 ), since a 2 * E{q 2 ) -E*q: 
q 2 = f f X(t) X(u)dt du 

a a 


Again, the conditions of claim (3) are satisfied; thus. 


nq 2 ) 


•to 


X(t) X (u)dtdu 


1 


J 


b -b 


: IS. 


R(t,u) dtdu 


Thus, the variance becomes: 


°q~ f f lK(f*w)- dtdu 

J a J a 


b -b 


n 

a a 


CV , u) dt du 


Example 9 

In Example 8, let 


■ r 

IT I 

•/ _ 'f 


X(t)dt 


and assume XU) is stationary (wide sense); find a 2 . 


(2.43) 


(2.44) 
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Solution 


From Eq. (2.43), we get: 



= m = constant 


From Eq. (2.44), we get: 




(2.45) 


Equation (2.45) can be simplified much further. 

Before proceeding with the simplification, let us review some simple mathe- 
matics (coordinate transformation). Let g t and g 2 be continuous (real) func- 
tions, such that: 


x =£,(*. z) 


y = g 2 (w, z ) 


v 


<ff r 9 2 > 




Y 



For example, (£,,£ 2 ) maps D r onto D . Then the following well known result 
is satisfied; 


// f(x,y) dx dy = 
D 


If fig ,(w,z). ^ 2 (w>,Z)) 

o’ 


d(*.v) 

d(u\ z) 


dw dz 


(2.46) 
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For any continuous real function /(%•)* b{x.y)/by.v, z ) is the determinant of 
the Jacobian matrix: 


ax 

tor 

dw 

bz 

3v 

dv 


dr 


where the entries are continuous. 


Application of the Above 

Let fj - t - u and t 2 = t + w. (This corresponds to a rotation of the axes 
by 45° and a scale change of \fl !) The J (determinant) is determined: 


Thus, 


Hence, 


a(V' 2 ) 

d(MO 


I - 1 

I 1 


= 2 


d(r. u) 

<Kt r t 2 ) s 2" 


r r c,, a) rr ' 11 

J-fJ-T \ ~ ' J -2T •* -■> —♦ (»« I 


i r l ‘ f*T-\t |i 

= tI I dt 2 

~J- 2 T *'- 2 r+l» 1 l 

=\f If,i) 

= f ( 2 T- \t l \)C(r l )dt l 

J -2r 

= f (21 

J -2T 


IT- I rl)C(r)Jr 


where and r are dummy variables. 
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Using this last result on Eq. (2.45) yields (dividing by 4T 2 ): 


<24, > 

J-2T ' " ' 

Equation (2.47) is true for the complex A"(/) as well; however, for the real 
case, Eq. (2.47) further reduces to: 

«’ I 248 ' 


2.9 DEFINITION OF ERGODICITY 

Let X(t) be a stationary process and assume that: 

i r T 

lim — I x(t)dt 

T-*oo J -j 1 

exists in the q.m. We say ^(f) is ergodic if: 

q.m. 

x(f) dt - m (2.49) 



That is. 



0, as T -► 00 


= m 


56 



and utilizing Eq. (2.48) the variance of q is given by: 


(2.50) 

Thus, it is obvious that X(t) is ergodic in the quadratic mean if and only if 
(see the above equation) the following is satisfied: 


IT 




C(r) dr •* 0, as T ■* <» 


(2-51) 
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EXERCISES 


2.1 Sketch a few samples of the process X(t) given by: 

3((f) = A sin (wf + 0) 

(a) If A is a random variable uniformly distributed over [-1,1]. 

(b) If co is random and uniformly distributed over [0,tr] . 

(c) If 0 is random and uniformly distributed over (0,2rr). 


2.2 Obtain the mean and the variance of each process in Problem 2.1. 

2.3 Let the sample function process X(t ) be given by: 


x(r) = a cos (co Q r t 0) 


Assume a is deterministic and 0 is a value of the random variable 0, 
where 0 is uniformly distributed over [0,77/2]. Find the mean, variance, 
and the autocorrelation function of X(t). 


2.4 Let the sample functions of a process X(t) be given by: 

x(f) = cos (o o Q t + 0) 


where 6 is uniformly distributed over [0,27 t]. Obtain the p.d.f. of the 
process, and comment on the stationarity of the process (in the wide 
sense). 


2.5 Let Z{t ) = X(f) Y{t) be real processes. Assume that A\f) and Y{t) are 
independent stationary processes (wide sense); then: 

(a) Obtain /? z (r) = R x ( r ) & r (r). 

(b) If the processes P{t) * AXO - m x and Q(t) = V(r) - m y with the 
corresponding 


R p (r) - exp (-irlrl) 



and 


/? 0 (T) = exp (-b\r \ ) 

where a and b are both positive, then obtain R z (r). 

2.6 Let X(t) be a wide-sense stationary random process with no periodic 
components. Assume X(t) and X(t + r) are uncorrelated as Irl be omes 
large. Show: 


R x (t) * m\ 

2.7 If X(t) and Y(t) are independent random wide-sense stationary processes 
and Z(t) and W(t) are such that : 

Z(/) = X{t) + T(f); W{t) « 2 X(t) + Y(t) 

Then find R z (r\ R ^(r), R zw (t\ and R wz (jY 

2.8 Consider the process 2((f) = /(f) K, where /(f) is a deterministic complex 
function (non-random), and Y is a random variable. Assume that we have 
a constraint on 2f(f) such that X(f) is of mean zero and is wide-sense 
stationary. Then perform the following: 

(a) Determine the restriction on /(f). 

(b) Obtain the most general form of /(f) that satisfies the requirement. 

2.9 A process T(f) satisfies: 

Y + Y * X(t\ t > 0 

where T(0) = 2, m x = 1, and R x = I + exp (-Irl). Find the following: 

(a) m Y , 

(b) R x y U l , f 2 ), for fj and f 2 > 0. 

(c) R YY (t r t 2 ) f for and f 2 >0. 

(d) Comment on the stationarity of R YY - 
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2.10 Assume C x (t) of the process X(t) satisfies: 


j lC^(f)l dr 


<oo 


Show that 


i c T 

im 2? I dT 

'-SCO •/-T’ 


lim 

T 


w 


2.1 1 Given the processes and N(t) such that 


XU) » b + N(t) 


where b is a constant, E(N) = 0, and N is stationary, show that if b is 
given via 


-rf 
J 0 


X(t ) dt 


it will satisfy 


and 


E(b) = b 


variance 


Of*- yj (l - R n (t) dT 



CHAPTER 3 

POWER SPECTRUM OF 
STATIONARY PROCESSES 


Before discussing the power spectrum, which is defined for the wide-sense 
stationary, we need to familiarize ourselves with some basic concepts and 
definitions. 


3.1 CLASSIFICATION OF SYSTEMS 

Heuristically speaking, a system refers to a modeling of a physical phe- 
nomenon (which is idealized in some sense). We shall visualize a system via a 
black box which has many inputs and many outputs (vector input-output). 


INPUTS 


OUTPUTS 




The input-output is often indicated symbolically by: 

YWIUU) ( 3 . 1 ) 


where U(t) is a vector-valued input, T(Z) is a vector-valued output, and L is an 
“operator'’ relating the input to the output. The operator L depends on the 
particular physical ..odel. 
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Den m lion 1 

We say the system is linear if the operator L is linear, i.e., the following 
conditions are satisfied: 


Half) = a lAlf) 


13.2* 


where a is any scalar, and 

L{V X + U 2 ) = UL\) + UU Z ) (3.3) 


for any inputs U x and U . Equivalently, Eqs. (3.2) and (3.3) can be combined 
into one equation: 


JAaU x +0U 2 ) = aLW i ) + pIAU 2 ) (3.4) 


for any pair of scalars a and 0. 

In tne following examples assume the input, and the outputs are one- 
dimensional. 


Example 1 

Consider 


r</> = //(/• 



We know/. = J/dt and the conditions of linearity are satisfied. 


Example 2 

\ \ ) = u 2 (f does not correspond to a linear system since: 


/,|a «j(/) + w 2 (/)] = |a // 1 < / ) + // 2 ( / )] ^ 


^ a A(//,(0) + 0 /4 m,(M> = a u^it) + *3 //?(/> 
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Example 3 

Consider the electric circuit given below. 



Let v(t) be the input and i(t) be the output. Then, the output is given by: 




(3.5* 


It is easy to verify that the system is linear. 


Example 4 

In the previous example change R to an inductor L and assume i(~°°) = 0. 
Then, 



v(X) d\ 


(3.6) 


and the system is also linear (left as an exercise). 

Example 5 

Consider a svstem given by: 

y(t ) = au(t) + b 

where a # 0 and b ^ 0 are scalars. The system is nonlinear! This is true 
because: 

L(u t {t) + u 2 (f)) = a( + u 2 (f)) * b ± Lu } U) u 2 <0 
The system will become linear if 6 = 0. 
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Definition 2 

A system is called instantaneous if its output at any given t^ne t is at most 
a function of the input at the same time. 


Definition 3 

A system is called dynamic if it is not instantaneous. Example 3 is instan- 
taneous and Example 4 is dynamic. 

Definition 4 

A system, whose output at time t is completely determined from the input 
in the closed interval (/ - 7\ /) , where T > 0, is said to have a memory T 1 
Thus, if 7*^0, the system is dynamic, otherwise it is instantaneous. In 
Example 4, the memory is infinite. 


Definition 5 

A system is realizable or causal if its output ) does n v depend on the 
future value of the input. Thus, r(0 can be determined from the past (and 
the present) information of u(\) (i.e., X < t and not on X > r). 


Definition 6 

A dynamic system is said to be lumped if it can be characterized by a set 
of differential equations for the continuous case (and difference equations for 
the discrete case). 

In the classical characterization of a linear system, any lumped linear sys- 
tem (assume scalar inputs and outputs) can be represented by: 


r(t) 



h(f, t) n(r) dr 


(3.7) 


where h(r, r) = L 6(f - r) * response to d unit impulse function applied at 
time r. 


If the linear system is causal, then: 


h(f, r) = 0, for T>t 


(3.8) 
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Otherwise, y(t) would depend on h(t) for r > / (future value). Thus, it would 
not be realizable. Hence, Eq. (3.7) for causal systems can be written as: 



h (f. r) m(t) dr 


(3-9) 


Definition 7 

A system is time-invariant if the time translation of the input causes the 
same time translation in the output. That is, if 

y{t) * L i #(0 

then ii(f - X) would correspond to y(t - X). 

It is easy to verify in a linear time-invariant system that the impulse 
response h (r, r) = L(6(/ - r)) becomes: 

h(f. t) = L 6(f - r) « hit - r) (3.10) 

where h and h are two different functions. 

Thus, the linear time-invariant system is entirely specified by a response to 
a angle unit impulse, which can be applied at any given time t . For the sake 
of simplicity, we shall assume the time / = 0. Hence, 

*(') = !Mt) (3.11) 

For a time-invariant linear system given by Eq. (3.7), one can write: 


J(0 = 



hit - r) u(r) dr 


(3.12) 


Equation (3.12) is of a well known form, called the “convolution integral,” 
and it is denoted in the literature by h * u. We are going to talk more about 
h * u in later sections. 

Remark /. Since the integral given by Eq. (3.12) is the limit of a summa- 
tion (definition of Reimann integral), we can think of the output ,v(/) (signal) 



to be resolved into unit impulses. For example, consider a finite interval 
[-T*T\ and finite unit of pulses (steps) with width At occurring at/* JtAr, 
for k * 0,±1,±2, 77 Ar (see sketch). 



The summation 


N 

£ ,**Ar)/^ T U - *Ar) Ar 


**-/v 


where P Ay (/ - k£r) is a unit pulse with width Ar. The height of the unit 
pulse is 1/Ar to make the pulse area equal to one. As Ar-*0, and 

then, if the limit of the above summation exists, it must be equal to 
y{t) given via Eq. (3.12), Le., 



h(t- r)w(r)dr 


Discussion 

Physical systems are characterized by models consisting of idealized ele- 
ments. Choosing an appropriate model which characterizes all features of the 
physical system is very important and also very difficult. In gt I, a model 
of the physical system may be expressed mathematical]; integro- 
differential equations and is generally nonlinear. The complete i. tment of 
nonlinear systems is extremely difficult : therefore, we try to do the next best 
thing: approximate the nonlinear system with a linear system. 

The classical method of describing a linear system is by the impulse 
re ponse method. Even though the solution of the linear model is known, its 
treatment in the time domain for the time-varying case is not simple. If the 
linear model is time-invariant, we can use a transformation (such as Laplace or 
Fourier) to convert the complicated integro-differential equations into simple 
algebraic equations (frequency domain). It is of extreme importance to 
emphasize that the transforms can be used to great advantage only in the 
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time-invariant linear systems. In the nonlinear and time-varying cases the trans- 
forms cannot be utilized to advantage. 

It is very easy to imagine a situation where we transmit a random process 
J! ((f) (signal) through 3 linear or a nonlinear system. However, if X(t) is trans- 
mitted through a time-invariant linear system, we shall use Fourier transforms 
to simplify the calculations. The Fourier transform is also used for the decom- 
position of signal power, which will be defined in the following sections. 


3.2 FREQUENCY SPECTRA AND FOURIER 
TRANSFORMS 

Before developing the concept of the power spectrum of a stationary pro- 
cess, let us give some intuitive discussion of Fourier transforms and series. If 
the reader is not familiar with these concepts, he is advised to review Appen- 
dices C and D. In this section, however, a relatively non-rigorous approach is 
adopted for intuitive appeal only. 

Let us start by asking ourselves the following question: Is there an input 
signal which will pass through a time-invariant system without changing shape? 
The answer is “yes” and is an exponential function exp (Xf), where X is, in 
general, a complex constant. If we choose a special form of exp (X/), namely, 
exp then the output .y(f) would be proportional to the input, i.e.,y(f) 
= //(/to) exp (/cof), where fftjui) is the so-called ‘V'Stem function.” Since the 
characterization of the exponential functions of the general form exp (X/) (or 
exp (/tor)) is very simple, it is desired to resolve any general function /(r) in 
terms of the exponentials whenever possible. Obviously, one such case is the 
representation of a periodic signal /(f) in terms of exp (jut) (Fourier series). 


A periodic signal /(f) (not yet a random process) with a period T under a 
set of conditions (Dirichlet, see Appendix C) may be resolved into a series of 
complex functions over [-T/2, 772 J. The resolution is given by: 


/<')= 5Z c „ ex p(/ n< v) 


(3.13) 


where u = 2ir/7\ f € (-772,7/2), and the values of C are given by: 


/ T*2 

/(f) exp (-/mo 0 f ) dt (3.1 4) 

r /2 
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Recall in Eq. (3.14) that C n is, in general, complex and can be written as: 


C -ICUxpW.I 


(3.15) 


where C and 9 are functions of * «w A . 

It If V 

The essential information about the harmonics in a periodic signal consists 
of the magnitudes, phase angles, and frequencies. It is easy to see that all the 
information about f(t) is incorporated in C n and cj q = 2»/7\ since once 
these quantities are known, so is f(t). The real amplitudes IC Q I and the 
phases $ n can be represented graphically as a function of = «w Q , n = 

0,±I,±2, The collection of die graphs is called the frequency spectra 

(discrete). Typical amplitude and phase spectra are shown in Fig. 1 . It is easily 
verified that I CM is an even function of to, and 9 r is an odd function of w 
(left as an exercise). The reader may verify for himself that, for real sipais /(/), 




Rq- 3*1. Typical Ph ase and Amplitude Spectra 


3.2.1 The Fourier Transform 

Now suppose that the function /(r) is defined over the infinite interval 
(-oooo) d , d diat it is no longer <eriodic. Then it is still possible, under certain 
conditions, to resolve the nonperiodic function into complex exponential 
functions of the form exp (ju>t ). The intuitive argument is to reduce the 
spacing oj q between the components of a periodic signal. Denote the spacing 
by Ato = cj q = 2 ff/r (radians per second). We shall continue to consider 1CM 
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as a discrete function of «cj # . Since (see Eq. 3.14) I CM ** 0 as T ■* °°, we 
shall define a new variable G{jnu> 0 ) * G(jnAw): 




As T -► °° and Aco 0, uAto approaches a continuous variable <o and: 


■/ 


I /(f) exp (-/wf) </f 


(3.16) 


and /(/) can be written as: 


/(f) - lim 2J C m ex P (/ w 0 f ) = lint ^ ." -exp (prAtof) Aco 


Acj-* 0 w=-®* 


Aw**0 «=- 


Aco 

2a 


As Aco -* 0, nAco approaches a continuous variable co, such that 



G(co) exp (fust) </co 


(3.17) 


Equations (3.16) and (3-1 7^ are called the Fourier transform pair. Equation 
(3.16) is, in general, a complex function of to. As an exercise the reader can 
show that for real functions /(f): 

F m (u) = Fi-co) (3.18) 


Also, the reader will find it instructive to verify the transform pairs given in the 
appendix on Fourier transforms. 

If we use f = <o/2tt, and let P(f) - G(2rrf), then 


P{f) = G(2rrf) = 



fU) exp (~f2nft) dt 


(3.19) 



and 


m= 



P{f) exp (flirft) df 


(3.20) 


Thus, 


/' 

** —a 


P(f) exp df 


,±r 


C(cj) exp (/cot) dco (3.2 1 ) 


Equations (3.19) and (3.20) aie also called the Fourier transform pair. 


3.3 POWER SPECTRA 

We know that if G(<*j) corresponding to the nonperiodic function /(f) 
exists, then we can verify (see Appendix C) that: 




I G(w)l 2 du 


(322) 


holds (Parsevafs relation for Fourier transform). 

Let /(f). for example, represent the voltage across a resistance of 1 ohm. 
Then the instantaneous power p(t) defined by p(t) = v(t) i(f). where p(f) is 
the voltage and /(/) is the current through the resistance. Thus, the dissipated 
energy in the resistance (which is the integral of p(f)) is given by: 



lv(/)l 2 dt = 



1/(01 2 dt = 





(3.23) 


The average power P Ay is defined by: 


AV 


lim 

T «> 



1/(01 2 dt 


(3.24) 
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It is possible that total energy be infinite and the average power to be 
finite. Note that IG(u)l 2 from Eq. (3.23) represents the density spectrum, 
except for the constant l/2ir. 


Now let us consider X(t) to be a real stationary random process. Define 
X T (t) such that 


X T U) = 



I/I < T 
t/l > T 


(3.25) 


and let its Fourier be denoted by Xj-(w), i.e.. 



X T (t) exp dt = 



Jf(/) exp (-/w/) dt 


(3.26) 


We can see that, as T ■* °°, the signal X T (t) X(t). Utilizing Eq. (3.24), the 
average power of X(t) for /S [~T,T\ is given by: 



[X(t)] 2 dt = 



I X r (w)l 2 

2T 


do> 

lit 


where from Eq. (3.23), \ x T <<*'V 2 K^T) represents the power spectral density. 
However, the power spec run) ?•...) X(t) is defined as: 


S(w)= lim jj, f(lx r (w)l 2 l 

J*— *ao 


(3.27) 


Now S(w), by utilizing Eqs. (3.26) and (3.27), becomes: 


S( co) = lim Yf f IXyXw) x^(w)l 




■ " m bii 

T-<* (L^-1 


X(t) exp (-/wf) dt 


][/ XU) exp (/GO/) d/j| 
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fhe above equation can also be written as: 


S(w) 


5r[/ / " J 


,(f - u) exp (-/(/ - u)) dt du 


] 


where, from Example 9 of Chapter 2, we get : 


S(w) = l>m J R x (t) j^l - exp (-/wr) dr 



R x (t) exp (-/wt) Jt 


(3.28) 


Thus, for a stationary process, S(co) is the Fourier transform of R x (r)' 


R X M = 



S(co) exp O’otf ) dw 


(3.29) 


For a real process Jf(r), /?^(r) = R x (-t), Eq. (3.29) becomes: 



S(co) |ros cor + / sin cor) doj 


S( co) cos cor c/co 


if 

7T / 

^ n 


5(co) cos cor c/co 


(3.30) 
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Definition 8 

The power spectrum of any stationary random process X(t) (real or com- 
plex) is denoted by S( co) and is given L*y: 


S(o>) = 



R(t) exp (-/cor) dr 


where rt(r) is related to S(lj) by Eq. (3.29) for the general complex case, 
where Eq. (3.30) corresponds to the real case. 


3.3.1 Examples 

Before getting involved with the examples, a method of calculation for the 
bilateral Laplace transform is discussed. Assume the bilateral Laplace trans- 
form F b (s) of f(t) exists in some region, say, for o x < Re s <o 2 Then, 


F b (s) = 



fit) exp (-sf) dt 


fit ) exp (-sr) dt 



fit ) exp (-sf) dt 


fi-t) exp [-(-s) f] dt + 



fit) exp (-sf) dt 


- se if (-01 


where^is the one-sided Laplace transform. 
Example 6 

Find F g (s) of /(f) = exp (- 1 f I ). 
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Solution 


For t > 0, 


/(r)=|exp v -0 


Hence, 


SP\f(t)\ = F\s) = , for Re s < - 1 


Now, for t < 0, 


/(/) = -jexp (0 


which implies that*. 


&i/(-o i 



exp ( 


-"] ! 


(replace $ by -f) 


_ 1/2 

s + 1 


(replace S by - s ) 


1/2 
-s + I 


for Rc s > 1 . 


•• •‘V ,) -*V<-')1 1 (replace , hy + W'» 

_ 1/2 , 1/2 _ 1 
-r + 1 f+1 ,.,2 

and the region of convergence is - 1 < Re s < 1 . 

Remark 2. The Fourier transform of /(f) is obtained by replacing s 

by /cj. Hence, ^(co) = 1/(1 + co 2 ). ' 

Example 7 

Given :he stochastic differentia equation: 

x = -*<r) + «(/) 


74 



- u £ [u<rjutrjj = - r), the solution of xf.’) is given by: 


x(f) - x(0) + 



exp (-(f - X)J n(X) d\ 



exp |-(r - X)| «(X) t/X 


and 


f !*(')*(/ + t))= £-|^ exp Hr - A)1 w(A)c/A^ exp (-(r + r - $)) 

/ f exp (-(2/ + r - £ - X)| i’|n(X) «U» t/X 

oo J - oo 


If exp (-(2r + r - J - X)J 6({ - X) t/X t/$ 


j exp |-(2r + r - 2$)] t/$ =yexp |-(2r + r» exp (2£)j 


= ^exp <-t) - exp |-(2r + r)] . if t > 0 


Now, as t -* oo, 


/? A ,(r) = E\ X(t) X(t + r)] = “-exp (-r), for all r > 0 
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K x (i) =^exp (t) 


D - 


i . i 


To obtain S^o;), we can either use Example 6 or the direct definition of 
the Fourier transform. Tims, 


S,M = 



exp (- I r I) exp (-/cor) dr 





1 

w 2 + 1 


Example 8 

Suppose S x ( to) of a process A"(r) is given by: 


= 


i_ 

w 2 + 1 


Find ^ x (t) by the Theory of Residues. 

Before completing this example, let us give an informal discussion of the 
inversion formula. 


Let fit) be a given fur.ction such that its Fourier transform ,>"(to) exists. 
Then, for a fixed positive a>0, the Fourier transform of exp (-ot)f(t) also 
exists and is given by: 


/ 


fit) exp (-af) exp (-/to/) dt 



fit) exp | ~io *» /to) r] dt 
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Derote the integral as f\a ♦ /t ) Thus, /(/) exp (-or) is given by: 


/(f) exp (-of) = 1 (f)o + /w)| 





f(o + /to) exp (/< or) dto 


Multiplying both sides by exp (or) (a is constant), we ge:: 



F{a + ;cj) exp [(o + /to) r J t/oo 


Making the change *' •'ble s = o * /<*>, w obtain: 


/<r) 


^ r-‘>- 

2 MjJ * 


(j) e.:p (jf)/fc 


( 

I 


rescues of F fl (i) exp (sr) at 
r singularities to left of line 


chosen. 


for t ^ 0 


residues of F & is) exp fsr) at 
y singularities to right of line 

chosen. for r < 0 


The equivalent bilateral transform conespondirg to S v (oo) is denoted by 
S B (s) and is obtained from S x *co) by substituting to = s/; 


Now applying the inversion formula to Example 8: 


s — = — - 

8 I - i 2 s 2 - I 


-I 

is- I Ms + M 
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where SJs) exists for - 1 < Re s < 1. Now, 

B 


R(t) = 



S fi (i) exp (it) ds, where - 1 < c < I 


E residues of Sg(i) exp (it) at 
poles of S fl (5). for r > 0 


E residues of S (i) exp (it) at 
poles of S B (s), for r < 0 


- 1(1 + i)exp (it) 


(i- 1 X 1 ♦*) 


U=- 1 


~ exp ( r) 




|_ Li* * I ) exp (it) | _ exp (ir) | 


(i- 1X1 +i) 


!*=' 


I +i 


!*= » 


a . for T < 0 


= |exp(-!rl) V 


T 


Example 9 

If S(u>) is a power spectrum of a given process, show that J 2 S/Ju)* i c tot 
a power spectrum. 


Solution 


otcoi “ 



R[t) exp i jijjr) Jt 
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which implies: 


d 2 S(u>) . 
dcj 2 


l-r 2 /?(r)J exp (-/wr)(/r^ «£\ r - r 2 rt(r )} 


Now, if J J S(w)/(/w 2 is a power spectrum, we must have |-r 2 /?(r)J as an 
autocorrelation function. Let C(r) = -r 2 /t(r). If G(r) is an autocorrelation, then 
we would always have: 


IG(t)\ < GfO), for all r 


However, G(0) = 0 and 


0 = O(r) < G{ 0) 


cannot be always satisfied. 


Example 10 

Jf(f) = cos (w Q f + 0), 0 € f0,2ir|, is uniformly distributed. Find 
Solution 

From Example 6. Chapter 2: 

R(t) = EXU) X(t + r) =^cos w 0 x 

■*- S % (co) =»y|^cos cu 0 r| - ^|6fw - <o 0 ) + ♦ cu 0 )| 


= 2(^T> ,6(f ‘ f o > + 6,f + f o >l 


4|6 ( f-f 0 M6( f + f 0 ): :*f 0 


CJ 


o 
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Example 11 

In Example 8 of Chapter 2, the autocorrelation function J?(r) was given 
by: 

! !-ItI, Irl < 1 

0, Irl > 1 

Find ^(w). 

Solution 



(I - Irl) exp (-fjjr)dr 


i 

(1 - r) cos (jjr dT 



I rl ) exp (-/wr) dr 





3.4 MAJOR RESULT 

In what follows, we shall show that a function /?(r) which has a Fourier 
transform S(o/) is an autocorrelation function of a stationary rand<im process 
XU) if and only if >0 for all where Xi •) is continuous in the 

quadratic mean (q.m.). In order to prove this major result, we need to prove 
some important results given by Theorems I and 2. which will appear in the 
sequel. We shall assume AV) is continuous in the q.m. unless specified other- 
wise. 
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Theorem 1 (Bodmer's Theorem) 

The function R(r) is an autocorrelation function of a stationary process 
Jf(/) if and only if J?(r) is nonnegative definite. 

We have already shown that if A(r) is an autocorrelation function, then it 
is nonnegative definite, since for any collections of t % , f 2 , . . . , t n (time) and 
complex parameters a,,a 2 , . . . ,« w : 


£ 


4*=l 


*V'*>V* =£ ] 


E 


/■. 


>0 


(331) 


However, the converse is more complicated and will not be proven here. (For 
the proof, see Gnedenko, Theory of Probability , Chelsea publication, 1962.) 


Theorem 2 

A function i?(r) with the corresponding 5(c*>) is nonnegative definite (auto- 
correlation) if and only if it cs * be represented by: 



exp (jon) S(lj) do) 


where 5(<o) is never negative (i.e., S(oj) > 0, for all co). 

The proof is relatively complicated nd will be eliminated here; for a proof 
see the same reference shown in Theorem 1. 

As a special case of the Fourier transform pair /?(r) and 5 (cj), we have: 


S{ 0) 


-r 

** - 0O 


R{r) dr 


(3.32) 


and 



5(co) du = 




(3.33) 


81 



and R( 0) is the average power by definition, i.e.. 


R(0) = E[\X{t)\ 2 ] 


Definition 9 

A stationary process X(t) whose power spectrum 5(cj) is constant for all cj 
is called a white-noise process. If 5 (oj) = = constant, we obtain: 



W Q exp (/cor ) di*3 = 6(r) 


(3.34) 


Hence rt(0), which is the average power, becomes infinite at r = 0. Thus, we 
conclude that the white noise process is a mathematical function that is very 
useful in practical applications. For example, it is convenient to utilize white 
noise as an approximation to an actual process whose power spectrum is flat 
(constant) over a frequency band. 

In application problems such as those that occur in control and communi- 
cation, we are faced with physical noise sources which are added to the signal 
as a lump sum. The power spectrum of the overall noise is essentially flat up 
to frequencies much higher than those that are significant for the signal and 
the system. 


3.5 INPUT-OUTPUT RELATIONS 

Very often we confront i situation where we pass a stationary process 3f(f) 
through a time-invariant sy*. cm, and are interested in determining the output 
(along with its statistics). 

Consider the (bounded) sample function A(r) from the ensemble {A(r)} 
which is applied to a time-invariant system with impulse response h(f) (see 
sketch) and the output Y(r)- 

We know T(r) can be written as: 

XU! I I Yit) 

- — H Mr) I— — » 


>V) = 



h{\) JS 


(3.35) 
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Now let us find Ryir). 


From Eq. (3.35), we have: 


*/ 


Y(t + r) = I h(u)XU + T- u)Ju 


Thus, R(t) - F|T(/) T(f + r)J can be written as: 


= E^J h(K) X(t-\)d\ J h(u) X(t + t - u ) duj 


R(r) = 

LwZ-on */-op J 

(336) 

Rewriting Eq. (3.36) and taking the expectation inside yields: 

R y (T) = E [K/)>-(/ + T)} = f f h(\) h(u) £!*(/ - X) X(t + t - «)1 d\ du 


'LL 


h(X)h(u)R r U + X - u) d\ du 


- k(-T ) * A(r) * /{^(r) 


(337) 


Now if S y (w), H( w), and S x ( w) exist, we can apply the Fourier transform to 
Eq. (3.37) to get: 

S y M r>} • .#{*(*)} '^{R X U)} 


H*(io , H(u) S^fw) = I //(w)l 2 S x (u>) (3.38) 


which is an important relationship yielding S y (w) in terms of 5^(co) and the 
system transfer function //( co). 

Rework 3. From Eq. (3.37) it is obvious that /fly(r) v(/ + r )l 3 func 
tion of t alone, and also due to stationarity of -Y(f), EX{t)~ m = constant. 
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wltich implies EY(t) is also constant (see Eq. 3.35). Hence. T(f) is stationary 
(wide sense). 

Remark 4. 


R Y (0) = E[\Y(t)\ 2 \ 



Sy( cj) d<jj 


\H{ju)\ l S x iu)du 


(3.39) 


Remark 5. The results are also true for the complex stochastic processes. 


3.6 INPUT-OUTPUT OF MULTIPLE TERMINALS 

Suppose we have two time*invariant systems characterized by their impulse 
responses /r ( (‘) and /» 2 ( # ). respectively (see sketch): 


X*lf> 

1 

f/k 

y.w 

! k. 

X-(t) 

£ k 



n ^ Iff 


A 2(f) 


(a) (b) 


where X x (t) and X 2 (t) are sample functions from {AX f )} . which as before is 
assumed to be a stationary ensemble. 

f-et us calculate R Yl y 2 (r). As before K^/) and Y 2 (t) can be written as: 




<fk 


/ 


Y,U)* I /i 2 (u)X 2 (t- u)Ju 


(3.40) 


(3.41) 
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and a simple calculation (similar to the previous case) of /?y,y 2 (r) would 
lead to: 


* y y W 


E[Y t (t) r 2 (f + r)] 


-[£ 




h x (\)X x (t- \)d\ I h 2 (u)X 2 (t + T- u)du 



h , (X) h 2 (u ) R x x (r + X - u)dkdu 


(3.42) 


where /?jr,x 2 ( T ) is cross-correlation of Jf,(r) and ^(r). Hence, once 
again: 


R y y 2 (T) = h x^ * h 2^ * R x t xM < 3 43) 

Thus, assuming that the appropriate Fourier transforms exist, we obtain: 

S Y Y ^( co) = H 2 (o>) S X ' X fa) 

^H]^)H 2 ^)S xX ^) (3.44) 

which is a very general result, relating the input spectrum of to the 

output spectrum S Yx y 2 (w). 

Note that as a special case of Eq. (3.44), if we let X x = X 2 and h { = h 2 
(which implies Y } = Y^), we obtain Eq. (3.38). Note that Eq. (3.44) is also 
true for complex processes. 

Remark 6. Th reader may verify for himself that if X x U) and X 2 (t) are 
uncorrelated, so a.. F (r) and Y 2 (/). 

Discussion 

In applications, and // 2 ( go) very often have finite bandwidths. i.e., 

H x {w) = 0 for some such that Iwl > to #) and, similarly, // 2 (to) = 0 for 
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some <o, such that I col > to r It is obvious that if //,( co) and // 2 (w) have 
nonoverlapping spectra, then 




which would yield: 


^ 2 ^) ~ 0 


or 


y- (r) = 0 

In that case, the processes >^(0 and V 2 (r) would be orthogonal. 

A very important consequence of the above is that if ^f(f) is transmitted 
through an ideal filter, i.e., 

(Aq, for I col < <o 0 
l//(co)l= < 

* 0, otherwise 

then the output signal K(0 and the signal suppressed by the filter would be 
orthogonal. That is, if vT(r)has a frequency content beyord u> 0 , it is going to 
be suppressed by H( co) and the suppressed portion is orthogonal to Y(t). 


Example 12 

A white-noise voltage source ^(f) with power spectrum S^fco) - A' 0 is 
applied to an RLC network (see sketch). Assuming that the system (circuit) is 
at rest at t = 0 (no transients), determine S y (co). 




Solution 


1 


1 




yc oc 


M 


R + /co£ + 1 + /'co + 

/wC /co 


We know that: 


S r (co)= l//(co)l 2 S^(co) = l//((o)l 2 /C 0 


Z/(u>) can be calculated from the above as follows: 


i//(co)i 2 = 


/<o 


1 + /co 

(O 


to 


1 + 


(-£)■ 


» S y M = 


CO 


1 + 


(“4) 


K n = 


° 


3.7 SAMPLING THEOREM 

The sampling theorem (due to C. E. Shannon*) is very important and has 
produced some unexpected re r ults. The utilization of this theorem is prevalent 
in control and communication theory. It must be emphasized that the sam- 
pling theorem, whether we are dealing with deterministic or stochastic signals, 
will only hold for hand-limi ted signals , that is, signals whose Fourier trans- 
forms are identically zero beyond a finite band of frequencies. In order to 
develop this concept, we shall first deal with a signal X(t), which is deter- 
ministic. T H more precise, we shall state the meorem. 

Theorem 3 

Given a deterministic signal X(t) whose Fourier transform ^\co) is zero 
beyond I co I > co f rad/s (see sketch): 

~ 0, for all lull > u> c 


*C. K. Shannon, '‘Communication in Presence of Noise," Proc. IRE , Jan. 1947. 
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U*(wH 



Then X(t) can be completely and uniquely recovered by its values sampled at 
uniform intervals of T = jr/co. seconds (or smaller), and it is given by: 


<*> 

xu)= £ 

n = -°° 


X(nT ) 


sin [w c (r - nT ) ) 
u c (t - nT) 


( 3 . 45 ) 


Proof 

There are several ways of proving this important theorem, but we shall give 
the simplest proof. 

From the inverse Fourier transform, we obtain: 



,#’(u>) exp (/cor ) dcj = 



C 

&Xia)) exp (jut) du) 


( 3 . 46 ) 


Now, assune that ^'(u) is a part of a periodic function JT + (cj) (see sketch), 
such that: 


.r(w )=,4 T + m, ifiwi<co c 


i (w)i 
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Hence, for I to I < u> c (see Appendix D). 


,^(w) = ^ b n exp (jnuT) 

rt = -oo 


0.47) 


where 


r = 




and b n is given by . 


b 


n 


f 

2tt 



*(to) exp (-/>tu>D Jcj 


(3.48) 


If we substitute t - - nT in Eq. (3.46), we obtain: 


i r 1 . 

A^-nD = ^ I J^’(co) exp (-/VitoD Jco 


= yr f .^‘(w) exp (-/»w7) Jcj 


Now. utilizing the definition of b n from Eq. (3.48). we get from the above 
equation: 


X(-nT) = ±b 
T n 


b n = TX\-nT) 
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or. equivalently. 



Using the above in Eq. (3.47) yields: 


^‘(w) " T X(nT) exp (-/nw7") 


(3.49) 


Now, if we substitute Eq. (3.49) into Eq. (3.4b; we obtain: 


*(')= £ 
» = -*» 



exp |/w(/ - nT)] dco 




sin fco(/ nT)] 
co .(/ - «D 




which is exactly the result we are after. 
Remark 7. If we substitute T - r/c o . then 


sin [v it - 


co (r - nf) 


m ”["<(' 'M 


sin (w / - «?r) 
(to ,/ - mtt) 


Remark 8. ”iTie function 


(3.5 i) 


sin |w (i - nT)\ sin (c o r - nn) 
co U * nT) <u t - nrr) 


is an “interpolation function** wind) is multiplied by X(nT) and is summed 
v ver all /; to yield ,V(;). 

Now- we shall discus the cas' where X(t) is a stochastic process. We will 
sliow that the result given by Eq. (3.51) holds for the siodiastic case in the 
quadratic me a* (q. n.). That is. 


q= n, E 


■V( / ) = 


XinT) 


sin (t > t - .•)(!) 

( ■ t - /iff) 

< 
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*w. eqimbody, 




JOfiTi 


m (cof - mr] 
(u» - «*) 


T] 


( 3 . 52 ) 


Befew (Lj«firliB( the r'oof, we shaT >«bs some propert i es concerning 
die pedo' l ' of Jit) aao ^(r). la n* i follows, Jffy) is awed to be 


la Appc—di* D we Cnw the p erio d icity of the deseauustic s pate and 
construct m infioite-drneitaosal rector space A, with its sobspace H that ws 
spanned .« the set 

{esp(*ew e »£,_. 

For stochastic sipris we stall modify Appendix D. If we change the norm 

/ r/a 

l/<r)l*4r 

r/a 

in the appendix for the deterministic case to: 

IX f ~ W)! 1 ) 

hr the stochastic case, all of the results will hold. Thus, the norm for the 
stochastic case is the quadratic mean or the mean square. Now if a stochastic 
process AT(f ) is periodic (almost everywhere and not in die quadratic mean as 
yet), i*., 

X{t) m XU + T) = X(t+nT) 

Then it is very easy to verify that R x (j) is also periodic, since 

R x (r) « E\X{t + r) Af*(0] - E[X(t *r*nT) JT*(r)J * R x (t + nT) 

( 3 . 53 ) 
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Hence, the periodicity of JKU) implies the periodicity of Jf^r). However, if 
R x {t) h periodic with the period T, i*.. 


Jtj^r ♦ 7> « 


then (left as an exercise), it can be shown that HO is periodic in the quadra- 
tic taeaa and can be expanded into a Fourier series: 


■*10 q * > * E «„ exp (Jhw/K ^ (3.54) 


where e a ’s ate the usual Fourier coefficients and are pairwise orthogonal. Le.. 


E[a m ajj * 0, for al a * m 


We can also write R(r) by a Fourier series given by: 


r <r)= E 


C m exp <pu> 0 f K 


lx 


<3.5 


where the C H 's are again the Fourier coefficients, and the a .' s and the C.'s are 
related via: 


c. -*110*1 


Now, if we use the Fourier transform on we get: 


S x («) = 2tr E *(«•> " »%) (3.56) 
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Let as annate iul the attoconefaboa fimctmi Jt x (r) has a head Minted 
y d n aa 5j(w)i Ap pli cat ion of Eq. (3.45) would pw rise to: 


*x(r)- 2 R X (*T) 


*«» («/ - a*) 
(w c r - **) 


(3.57) 


■a avow W M9J 




q ^* £ jr<«n 


so (w.f - mr) 
(«/ - «nr) 


(3.58) 


The foVowiag proof is from reference |l). To prove Eq. (3.58), we show: 


(7 A sin (u l - nw)\ 1 

*[(*•>- *" n T«.^7sr) '"""J 

“ sin (gj t - mr) 

= no- mT)- R ("T- mT) ^ = 0 ( 3 . 59 ) 


where it is left to the reader to verify that: 


•a 

R(l - mT) - R(nT - mT) 


an(u r- mr) 

C 

(w c t - nn) 


which is shown by substituting t - mT for r in Eq. (3.57). 
Now, utilizing the identity 


R(t ) * J] R(nT - mT) 


sin (c o c (t ♦ mT) - nit] 
uT(/ + mT) - mr 
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(where this identity is proven by changing / - mT to r in the preceding 
equation), we now get: 


k 


JT sin (uj t - n?r) V 

«•»- E -)*« -» 

« s — ' J 


(3.60) 


Thus, utilizing Eqs. (3.5** ) and (3.60), it is easy to show that: 


( • sin (<*} t “ «ff)\ 2 

E -„ r ) 


[(• 


if -Z, sin (u> I - wir)V 

*!(*'• E -to 

m =- m l r 


/ “ 

sin (a? f - zia 1 ) \ 

(au»- Z 



f r 


3.8 SUMMARY OF SOME USEFUL RESULTS 

In what follows, we shall summarize some significant properties concerning 
complex stationary (wi de-sense) processes 3f(f). )(r), and Z(r). 

(I) /? V I0) = £|!.V|/>I 3 ). 

C) r\xt) = R x ( r). 

If A' is a real process. then 

R x (t) = R x (- t) 


13) If Z(r) = XU) * >'(r). then 

R^lr) = R x {t) + R y {t) + R XY iTt + « >V ID 


where /?*. ) (r> = R } . v (-r) 

(4) A |l VU + r) - ,V<f)l : } = 2 Re |tf<0) tf<r>| 
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(5) Assume R x i*) is not periodic* then 


lim 
ItI - 


U^fr) * 1ml 2 


where m = £[,¥1 and if X(t) and X(t + r> are uncorrelated as ItI-*- 00 . 
Thus,iff(JT(f)) =0, then 


lim 

I r I - 




(6) R x { 0)> for all r 

(7) A(*) is an autocorrelation function iff it is nonnegative definite. 

(8) /?(•) is an autocorrelation function iff its Fourier tiansfonn Sfu;) > 0, for 
all caj. 

(9) If Aff) is the input of a time-invariant system with the transfer func- 
tion //(cj), then the power spectrum of the output YU) is given by: 


s y M = I //Ml 1 S a .M 


3.9 IDEAL LOW-PASS SIGNALS 

We shall define X(t) to be an ideal low-pass process if S^(co) is given by: 


K 


o' 


for I cjI < u; 


5^M = 


otherwise 


Invoking the inverse Fourier transform, one obtains: 


R X U) = 




"S. V M = 


/ U’ 

w 


(co) exp (/tor) dor = K 


sin co r 

c 


U) T 
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Now let us show /^(r) as r -* 0 (we shall denote /^(r), t -► 0, as fl^TO)). 
Using L’Hospitafs rule on the ab* ve equation, we get: 


sin cj r 

lim R v (r) = lim A* 1 

x 0 C a ) T 

r -*0 r — 0 c 



T-+ 0 


~T (sin cj t) 
ar c 

Tt (W c T) 




Hence, we can write: 


sin u? r 
R(t) = *(0) — 

CJ T 


(3.61) 


From the above equation it is easy to verify that R(nT) - 0 for all n ± 0. We 
can also show that .Y(rtD processes are mutually orthogonal. This is true 
since 


E[X(nT) Jf(mr)] = R x \(n - m)r] = 0. for all n # m 


Now we shall summarize a significant result via the following dieorem : 
Theorem 4 

A band-limited process 3f(/) is low pass iff X(nT) are mutually orthogonal. 

Proof 

We have shown that if 3f(-) is low-pass (characterized by Eq. 3.61), then 
A*(«D are mutually orthogonal. If the processes Af</f 70 are mutually ortho- 
gonal, all we need to show is Eq. (3.61). Now R x (nT) by definition is given 
by 


R x inT) = A|3f(«D 3f(0)) = 0, for all n * 0 


because of orthogonality. Invoking the sampling theorem (see Eq. 3.57), we 
get: 


fl(r)= £ RinT) 


sin (to c r - zi7r) 

(C0 c r - W7r) 


sin cj r 

= .. . + 0 + 0+ ... R( 0) 

cj r 


sin cj r 

+ 0 + 0 . . . = K<0) — 

CJ T 
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3.10 REPRESENTATION OF BAND-PASS PROCESSES 

A signal X(t ) whose power spectrum is defined only over a band 


to 0 - oj c < I col < u> 0 + cj c 


and is zero outside the band (see sketch) is called a band-pass process . Note 
that the power spectrum S^fco) is defined only for stationary processes. We 
observe that the band-pass corresponding to the stationary process 3f(/) is 2to c 
and is centered at to = co Q . 


S X M 



CJ (rad/s) 


In what follows we shall show that a band- pass process consists of two 
components, given by: 


X(t) = AT, (/) cos cot + X 2 U) sin cot (3.62) 


where A^f/) and X 2 (t ) are stationary' (wide sense), and S X{ (to) = Sy 2 (co). In 
addition, these power spectrums are shown to be related to S^(co) by the 
equation: 


S x (co) = S x (to) = 

A j A 2 


S (to + to 0 ) + S x (co - co 0 ). 


0 , 


for I col < co 

c 


for I co I > co 

c 


(3.63) 
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We can also show that Sx,x 2 (co) and Sxjjr,(w) are related by: 


! /[S x (v - oj q ) - S x (u + w 0 )J , 


for I col < <o 

c 


for I col > to 

c 


(3.64) 


Note that £*^( 00 ) « not necessarily nonnegative because /?*,jr 2 0') is not 
necessarily nonnegat ; ve definite. Furthermore, as a consequence of Eq. (3.51) 
it can be shown that : 


£IJf(r)l 2 =E\X t {t)\ 2 = E\X 2 (t)\ 2 (3.65) 


Summarizing the above via a theorem is now appropriate. 

Theorem 5 

A^f) is a band-pass process (implies A\f) is stationary) with the correspond- 
ing 5 x (to) given above (also see accompanying sketch) iff Af(r) can be 
described in Eq. (3.62) and Eqs. (3.63) and (3.64) are satisfied. 

Proof 

Let Z{t) be a random variable such that ^(to) = 4 5^.(to) and be zero for 
to<0, i.e.. 


S z (to) = 4 £ x (to) 1 (to) (3.66) 

where 1(*) denotes the unit step. From Eq. (3.66), we can model Z(t) as the 
output of a linear system, with the input X(t) and the transfer function 
given by: 


H^io) = 2 1M 


(3.67) 


It is easy to observe that: 


2 I (to) = 1 + sgn co 
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where 


sgn co : 


I, co > 0 


-l, co < 0 


Thus. Z(r) can be modeled by an alternate approach, i.e. 


Z(t) = X(t)+jX(t) 


(3.68) 


where ,?(f) is defined by using X(t ) as the input of a linear system with a 
conesponding transfer function H( co) given by: 


//(co) = -/ sgn co, i.e., h(t) = 


nr 


(3.69) 


Hence. 


X(t) 


•/: 


h(t - T ) X(T ) Jt 


-i r ^ii) 

f ' T 


dr = — * XU) 
n l 


(3.70) 


Wo define J((f) given by Eq. (3.70) as the Hilbert transform of 3f(f). The 
process Z(t ) is called the analytic signal associated with XU)- It is useful to 
observe that if X(t) is the input with the transfer function H( co) = -/sgn u, 
then the output is 3>(r) because: 


(//(<o)) 2 = (-/ sgn w) 2 - - I (3.71) 

From Eq. (3.71), we can verify l//(w)l 2 = 1 and 

£j(to) = S x (u) and = /^(r) (3.72) 
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V V 

Let X denote the output of a system with the input X(t) and the transfer 
function //(co) = -/ sgn o>; then it is easy to verify that: 

X«) = -m (3.73) 

Hence, for the processes 3((r), Z(f), and X(t), their behavior can be sum- 
marized as: 


XU) 



j?«r> _ 


W(w) * — / Sfln u> 


“twi *•/ sgn w 


X(f) - -XU) 


Now, utilizing the facts that : 

= S xx^ ^*( w ) = / s 8n cj S x (co) 
and 

S xx ^ 36 S xx = -/ sgil w 5 *( w ) 

then 




(3.74) 

(3.75) 


(3.76) 


and 

R x #X) = -Rx x (t) (3.77) 

Now we shall consider the process Z(t ) exp (-/co 0 f), and let 

Z(t) exp (-/w 0 f) = X,(r) - jX 2 (t) (3.78) 

That is, 

X } (() = Re (Z(r) exp (-/w 0 Z)l = X(t) cos o o Q t + X(t) sin c o Q t 

(3.79) 
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X 2 (/) = Im [Z(t) exp (-/w 0 /)J = X{t) sin « 0 / - X(t) cos oj 0 t 



(3.80) 

from which we can obtain: 


aV(f) = cos cj q / + J( 2 (f) sin w 0 / 

(3.81) 

sin <jj Q t - A 2 cos oj Q r 

(3.82) 


From Eqs. (3.79) and (3.80). we obtain: 

£l*.(f + r) X,(r)l {[^(t) + R -(t)1 cos W() r 
+ - £j x (t)J sin gj 0 t 

+ [R x (t) - R j(t)1 cos w 0 (2r + t) 

+ |/?*£<r) + £$*(7)1 sin + T W 


Now if we use Eqs. (3.72) and (3.77) in the above, we obtain' 
£*(t) = R x cos to 0 T + £^$(.) 

which is stationary, and, similarly, 

/^(r) = R x cos w Q r + R x j(r) sin u> 0 r 


which implies: 


R x y) = R x M 


(3.83) 


(3.84) 


(3.85 > 
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Now, can be obtained from (3.83) by: 

s *, (w) = \ + w o> + V w • w o» 


+ T + w o> sgn ^ + w o) " S x^ u " w o^ *S« < W ' «o» 


(3.86) 


Let S q (u) denote S^fco) where we translate S^w) from its center at w Q to 
the zero frequency. It can be verified that: 

V w + tJ o ) = V w) + ty _w ‘ 2w o> (3.87) 

- w 0 ) = S q (-u „) + S q (cj - 2 lj 0 ) (3.88) 

substituting (3.87) and (3.88) into (3.86) yields: 

S x (<o) = S q ( w) + S q (- w) (3.89) 

and, further, it can he shown that : 


S x (co + w Q ) + 5^(0 - w 0 ), I wl < W 


S (w) = S (<o) + 5 (-w) = 

* 1 Q Q 


0, 


I col > W 

(3.90) 


where W is width of the band-pass. 

From Eq. (3.85), we also have: 

^ (w) = S x (w) 
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Hence, we have shown (see Eq. 3.90) that X x (t) and X 2 (t) are low-pass pro- 
cesses. 

To find Rx t x 2 ( T )' which is equal to -/?jr 2 jr,(r)* we use Eqs. (3.79) and 
(3.80) to obtain: 

^i^(r) = -R x ^ x ^(t) = /^(r) sin cj 0 t - R xx (j) cos <o 0 r 


and 


Sy Y = 
X \*2 


J x 2 x 


(0>)=j[S q (-u)- s q (u)\ 


i j[S x (iA) - co 0 ; - S x (gj + co Q )], I to I < W 
0, I col > W 


(3.91) 


It is easy to verify that: 


£(lA’ I (Ol 2 ) = £<l* 2 (r)l 2 ) 


because S^^co) = S;r 2 (<o). Similarly, it is easy to verify that: 


£'(l*(/)i 2 ) = £(!*,(>)! 2 ) •:'( I Jf 2 (f)l 2 ) 


The representation given by (3.81 ) and (3.82) of X(t) and ^(r) is known as 
the quadrature component representations. 
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EXERCISES 

3.1 In an RC circuit, where /? “ 1 £2 and C » 1 F, let the input voltage source 
be i random process X (t) such that S x (u) “ JC # and the output be the 
voltage across the capacitor denoted by Y(t\ Then: 

(a) Stow the transfer function M(/u) is given by: 


«(/w) = 


I 

I + /u 


(b) Obtain S^u) 


♦ 

XU) 


i n 


i fiz n f) 


(c) If Jtjglr) = find the mean and the variance of K(f). 

(d) Obtain the variance of Y(f) and comment on your result as Irl -* °°. 

3.2 Let y(f) be a process given by: 


Y(t) = X(t + 1) 

where X(i) is a zero mean stationary random variable. Show that: 


S y (w) = 4 S x sin 2 


3.3 Determine the correlation function of the white noise 5(cj) given by: 


CJj < I col < td 2 
0, otherwise 
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3.4 Repeat the previous problem for: 


! %AT, I col < w e 
0, otherwise 

3.S Determine the correction function of the process X(t) with its power 
spectrum given by: 


S(w) 


1 

(4 + u 1 ) 3 


3.6 In Problem 23, obtain S^u) and S^ujL 

3.7 The input X(l) to a linear time-invariant system has the correlation func- 
tion R x ( 7 ) * 8(t). Assume the output is Y(t). Then find R y (r) and 

as well as their corresponding power spectnmts, given: 

(a) h(t) * 1 , given 0 <t <T and zero otherwise. 

(b) h(t) * t exp (-2/), / > 0. 
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CHAPTER 4 
ESTIMATION THEORY 

4.1 INTRODUCTION 

HeuristkaOy speaking, stochastic estimation is the operation of assigning a 
value to an unknown parameter based on contaminated (noisy) observations 
or measurements involving some function of the parameter. The noise con* 
laminating the uncontaminated signal is assumed to have known statistical 
properties. The assigned value is called an estimate and the system or func- 
tions yielding the estimate is called the estimator. In many applications it is 
meaningful to assign a cost to an estimate representing a quantitative measure 
of how “good” an estimate is. This cost function should be a function of 
estimation errors, ue. y the difference between the true value and the estimated 
value. An optimal estimate is a function of received observations (measure- 
ments) which is chosen to minimize the expected value of the cost function. 
An estimator yielding such an optimal estimate is called a Bayes estimator. A 
basic feature of the Bayes estimator is that it requires a knowledge of an a 
priori probability density function. 

The present-day theories of estimation in the time domain, with few 
exceptions, owe their creation to Wiener and Kolmogrov. They basically 
considered the problem of “optimal” separation of a signal s(r) which was 
contaminated by additive noise n(t). denote the contaminated sifpial K(f) 
and call it observation, i.e.. 


T(r)=s(0 + n(r) 


We shall use the same notation for the signal whether it is a process or 
ensemble throughout this chapter. 
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Wiener studied the continuous-time problems and assumed that s(t) and 
nit) were typical numbers drawn from ensembles of those functions which 
were wide-sense stationary with known first two moments. In addition, he 
assumed the availability of a semi-infinite observation and solved the problem 
of linear least square estimation, reducing it to the problem of solving a very 
difficult integral equation, the so-called ‘‘Wiener Hopf equation.** That is, the 
optimal solution by Wiener's method would terminate with an integral equa- 
tion whose solution would be needed to optimally separate sir) from the 
noise. 


Even if one is willing to accept physically that the signal and noise be 
stationary and the observation be given over a semi-infinite interval, there 
remains a major problem: computation of optimal solutions which utilizes the 
“Wiener-Hopf integral equation,** where its solution with the exception of 
some academic problems is extremely complicated and computationally infea- 
sible. The statistical assumptions are also very stringent, which further limits 
the applicability to many practical problems such as those in orbital mechan- 
ics, space tracking, and countless others. 

Kalman and Bucy revived estimation theory. They provided an alternative 
method to that of Wiener by assuming the availability of the observation over 
a finite interval and not limiting themselves to stationary processes. Kalman 
and Bucy considered the special class of processes which could be generated 
by a white noise forcing function serving as the input to a finite dimensional 
dynamic system (explained in the following sections). They assumed complete 
knowledge about the model in order to avoid certain very difficult problems. 

The primary interest in Kalman's estimation technique is in practical appli- 
cations. We shall first discuss some basic results of mean-square estimation 
(quadratic mean) via the classical approach as well as some basic results of 
mean square estimation via Kalman-Bucy filtering. The latter involves the solu- 
tion of the so-called “state estimation problems'* associated with flnite- 
dimen&ional linear dynamic systems operating in a stochastic environment. A 
discussion of characterization of linear systems via the state variable approach 
will be carried out later in the chapter. 


42 SYSTEMS AND MODELING 

Physical systems are normally characterized by models consisting of ideal- 
ized clen...its which can be defined mathematically. Choosing an appropriate 
model which characterizes all the features of the physical system is very 
important and generally very difficult. For example, if an unnecessarily com- 
plicated model is used, it may be impossible to analyze the model. On the 
other hand, if an extreme^ simple model is utilized, the results obtained by it 
may not be a realistic approximation to the physical phenomenon. Generally 



speaking, a model of the physical system may be mathematically expressed via 
integro-differential equal k j. Although in red life very few systems are linear, 
they can often be adequstiy approximated by linear models over an operat- 
ing range of intern t, An treatment of a nonlinear system is extremely dif- 
ficult; therefore, it is oU u necessary to assume that the system under study is 
a linear system. The genial steps involved in the study of a physical system 
may be described by rigiae 4-1. 


LINEARIZATION L AWS COMPUTATIONS 
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Rg. 4*1. 


r a Physical System. 


A convenient method of characterizing a linear system is by its input- 
output relationship. In general, a system may have many inputs and many 
outputs. 

The electric circuit given by Figure 4-2 can be considered as a system with 
a angle input and a single output, where e(t ) is the input and e Q (r) is the 
output. 



Rg. 4-2. RC Electric Circuit 

In a linear system, the variable n(f) and . can be related by 


AO 



h(«, a) u(r)<ft, w(f 0 ) * 0 


if the system is causa) id is ; t rest at / Q , where h(f, X) is called the system’s 
impulse response. If .,»e systen is characterful by a constant coefficient 
differential equatii i, then it can be shown tkat h(r, X) = h{t -X). 





42.1 State Variable Characterization of a Linear System 

The classical method of describing a linear system is by its impulse re- 
sponse and, if the system is also time-invariant, by its frequency domain 
transfer function. It should be emphasized that frequency domain analysis, 
although the most attractive, can only be utilized for time-invariant linear 
systems. In nonlinear and time-vary.ng linear systems, the frequency domain 
analysis cannot be utilized to advantage. Even in the time-invariant case the 
frequency domain transfer function suffers from the major disadvantage that 
all the initial conditions of the system are ignored. The analysis and the 
synthesis of linear systems, time-varying or not, is a formidable task for multi- 
variable systems (vector input-output), and determining the interrelated effects 
in a multivariable system is a complicated and exhausting process. 

The modem alternative to classical methods of describing a system is by 
the "state variable"* technique, which is a matrix method for handling multi- 
variable systems. The technique aids conceptual thinking and provides a unify- 
ing basis for quantitative information about the system. The state of the 

system is defined in terms of a minimal set of variables A^(/) A^f) 

such that information about these variables at time t - f 0 along with the 
input i#(l) for all t > t Q uniquely determines the output K(/) for t > t Q . 

The state is the answer to the following question: "Suppose u(t) for / > t Q 
is known. What additional information is needed to completely obtain Y(t) 
for t > f 0 ?** We shall discuss the concept of state later in the chapter and give 
examples of its use. 


42 MEAN-SQUARE ESTIMATION 

In this section we shall construct a mean-square performance index in order 
to carry out the estimation process. Throughout this section, unless specified 
otherwise, the norm of a random vector X is defined as 


IMI 2 = X’X 

where A' is a column vector, and prime denotes the transpose. 

Now let us specify the estimation problem. Let two random vectors X and 
Y of dimensions n and m, respectively, be jointly distributed. Suppose Y is a 
measurement which in general has been contaminated by noise. It is intu - 
tively obvious that the received measurement, >’, should improve the informa- 
tion about A*. That is, if we had an a priori guess about A, knowledge of Y 
should improve the information about X. To be more specific, let us ask 
ourselves the question, "Given the measuicment Y = y, what is the best esti- 


110 



mate of X t denoted as Jf(K)> corresponding to the random vector A?” The 
concept of “best” has not been defined, but the most popular criterion is the 
mean-square estimate. Thus we are seeking to obtain the estimate, 
which is the function of measurement Y = y such that: 


fliwr-^(y)|| 2 l y=y j =min£-ll|A’-/i: 2 ly =y | (4.1) 

over ail random vectors /. 

The criterion given by (4.1) is referred to in the literature by the following 
names: 

(1) Minimum mean square estimate 

(2) Least square estimate 


The solution of (4.1) is relatively simple and is given by: 

X{y) = E[X\Y) (4.2) 


Hence, we are assuming a cost function associated with the uncertainty of X. 
We choose ^(y) as the best estimate that Y has the value y under condition 
(4.1) and claim it is given by condition (4.2). 

Let us verify (4.2). 

EllHX- /|| 2 | y ] =£[(*- l)' (X- /)|yl 

= E^XX - I'X - Xt + ft) I y\ 

= £111/- £WUI 2 ] +E\\\X)\ 2 \Y] - ||£l*inil 2 


From the above equation, the only term that has / involved in it is the first 
term, and the right-hand side of the above equation is minimum if and only if 
£*[11 1 - £[A1 T]|| 2 ] =0, which implies that the best solution of /, is: 

f=£[J!flK] =X(Y) (4.3) 


It is very important to mention that, in general, 2f(y) is a random vector, 
since ,?(•) is a function of the random vector Y . However, for each measure- 
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ment Y 9 y, the corresponding Jf(y) is a deterministic outcome of that ran- 
dom vector. 

Let g he a function of Y from rt" 1 R n and assume f Y (y) * 0. From 
condition (4.1) it is obvious that: 


£[H X - WH 1 1 Y ) < £(ll X- «(nil 2 l y ) (4.4) 


because we substitute g(Y) for / in that equation. 

Now, let us take the expected value of both sides of Eq. (4.4): 


£(£[11*- tott 2 ly!)<£(*[ll*-*(nH J l r ]) (4.5) 


Utilizing the identities:* 


£(£[|| X - ^OOII 2 1 Y 1 ) = * B x - 2 ] 


and 


£1(£[|| X - g( rwi 2 1 y] ) = £[|| X - *WII 2 ] 


we obtain: 


£[n x - *(nn 2 i < m x - «(nn 2 i (4.6) 


>s 

Equation (4.6) states a very significant result: the estimate X = £[^1 Y] is 
the best solution for the unconstrained case. Thus, the result can be appro- 
priately summarized via a theorem. 

Theor » 1 

For two joi»,. y distributed random vectors X and Y with joint probability 
density functions f XY (x,y ) and / y Cv)^0, the best estimate of £[ \\X - 
SdOII 2 J is given by: 


X(Y) = E[XIY] =g{Y) 


(4.7) 


*We are using the general result E{E\h(X t K)! y,)) = E\h(X, T)]. 
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Remark /. If e * X - £, then X = £(A1 30 is uncorrelated with any 
mapping of the random vector Y . Mathematically we can write: 

mod -o 

where the prime denotes the transpose. The reader is advised to verify this 
equation. 


4.4 LINEAR ESTIMATE 

The estimate just obtained is indeed the best with respect to the mean- 
square cost function. However, X(Y) is a nonlinear function of Y (for the 
general case), and it is extremely difficult to obtain the exact relationship. 
Since very often f XY (*>y) is not available, then E(X 1 30 may not be achiev- 
able either. 

Now we shall do the next best thing and introduce a constraint that £(30 
has a linear form of Y. That is. 


X=AY+b 


(4.8) 


where A is an n X m matrix and b is an n-vector. With the constraint (4.8) on 
Eq. (4.7), we get: 


E[\\X- AY - b)t 2 ] •E[(X- AY - b)' (X - AY - b)] (4.9) 


Now we can choose A and b (parameters) such that Eq. (4.9) is minimized. 
Let us denote the optimal values of A and b as A 0 and b Q , thus £(y) shall be 
given by: 


X{Y) = A 0 Y + b 0 


(4.10) 


Without any loss of generality, assume that X and Y have zero mean. To 
minimize the cost function given by Eq. (4.9), we shall calculate A 0 and b Q in 
the usual manner by setting: 


^Fliur- AY- b|| l l =^E[(X- AY- b )' {X - AY - b» =0 
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and 


^ £111 X - AY - bll 2 ] - E\(X - A Y - b)' {X - AY - b)] * 0 
From the first equation, we find: 

b o = 0 

and, from the second, 

A 9 = E(XY) l£Xmr‘ - C XY C~' (4.1 1) 

since X and Y are zero mean. Hence, 

X{Y)*C xy C~ y 1 Y (4.12) 

Now, if X and Y do not have zero mean, the random variables X - m x and - 
Y - m y have zero means. Applying (4.12) yields: 

X^ni^ s C XY C y l (K- m y ) 

or, equivalently, 

X(Y) = m x + C XY Cy l (Y - m Y ) (4.13) 

In the next section, we shall show that the best estimate can be derived by a 
different approach, the so-called “orthogonality principle.” The orthogonality 
principle is one of the most important ideas in linear estimation theory. Let 
us define an important concept. 

Definition 1 

An estimate X(Y) is defined to be unbiased if: 

EX(Y) = X (4.14) 
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That is, the average (with respect to f Y (y )) of the estimate is equal to the 
true value. This definition is motivated by the fact that if we are receiving a 
perfect measurement Y (i.e., Y is not random), then £(Y) is not a random 
variable, and 


E$(Y) « j?(y) - X 

That is, if there were no measurement errors, and thus no uncertainty, then 
the estimate is identical to the true value. Also for the unbiased estimate, we 
can write: 

E[(X - JPUAT- j?)'l = F((JP - E%){$ - £j?)'] =C- = £(«?') 


(4.15) 


where 


e = X- X 


4.5 ORTHOGONALITY PRINCIPLE 

In this section we shall assume, without loss of generality, that all param- 
eters are of zero mean, unless specified otherwise. For example, jf the mean 
of X is non-zero, then we shall introduce a new random variable X = X - m x 
which will have zero mean (as in the previous section). 


The concept of orthogonality is extremely important in the theory of 
linear mean-square estimation. We shall show that the orthogonality principle 
will serve as a necessary and sufficient condition that the linear estimate X be 
the best. The orthogonality principle states that if the measurement Y is 
orthogonal to the error e = X - i.e., 


E\(X - J?) J"| = E[eV\ = 0 (4.16) 


then the estimate X is the best linear m.s.e. 

Definition 2 

An estimate $ is optimal if it is the best linear mean-square estimate. 
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4.5.1 Discussion of Vector Spaces 

The only difference between those spaces that are generated by random 
vectors and those that are nonprobabilistic is due to the way we define the 
inner product (see Appendix B). 

Let f be a vector space generated by the set of all linear combinations of 
the random vectors X v X Jt . . . ,X n . Let the inner product between two vec- 
tors X and T G V be defined as: 

(X,Y)^EVCY) (4.17) 


The norm generated from this inner product is defined by (X,X) 1 ' 2 . Since 
the norm under the definition of the inner product is different than the norm 
||*|| in the previous section, we shall denote it by ll*ll q m * where it is defined 
via: 


\\X\\l m ^(X,X)^E{X'X) (4.18) 


Assume that the n vectors X i , . . . , X n are linearly independent, and let M 
be a subspace in Then we know that every vector X in T' can be uniquely 
decomposed into the sum of two vectors, belonging to M and belonging 
to the orthogonal complement of A/, denoted by A 1 X . Thus, 

X = +tj 2 


where 


y l GA/and r\ 2 ^Af 1 . 


Recall the projection of X denoted as P on M is given by: 


PX = i?j 


and the projection of X denoted by Q on M x is given by: 


QX = v 2 
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where 


/> + (?■/ 

and / is the identity operator. Hence, 

Q*I-P (4.19) 

Now we can use the concept of a vector space to obtain a significant result. 
Theorem 2 

Let X be a random € Y, and let Z be a vector €EAf. Then 
I \X-Z\\ *E[(X- Z)‘(X- Z)) 

reaches its minimum if and only if 

Z*PX 


Proof 

For any X £ Y, we have 

x = n, + 9, 

where rj ( € M, t} 2 6 M l , and rjj ■ PX, rjj * (/ - P)X. We shall also have: 

\\x- Z\\l m *E[(X- Z)'(X- Z)\ 


v,) + (n,- Z)Y [(*- V + fo. - Z))\- 

(4.20) 


In the above equation .Y - is orthogonal to M , i.e M X - ?? 1 G A/ 1 , while r?, , 
Z f r\ x - Z are all members of M. Utilizing these facts in Eq. (4.20) yields: 
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ix - zw\ m =E\(X- I?,)l +E\in i - Z)'(u, - Z)\ 

-■ Jf -'».»U + l,, i" ir| U (42,) 

From the above equation, it is obvious that: 

• i * 4 “» 

since Ifo, - Z|| * m >0. Thus, the inequality in (4.22) becomes an equj’ *v if 
and only if 

z = n x ««r 


4.S.2 Application of Theorem 2 

Assume that we have received m measurements that are linearly inde- 
pendent, say, the random vector K 1# V' 2 , . . . , Y m , Let M be the vector space 

spanned (generated) by the set of all linear combinations of Y % Y m . 

According to the theorem, || X- Z\\ * m is minimized if and only if 

z = pxeiu 


If Z £ Af, then Z can be written as the linear combination of 


Claim /. Let T , K Y m be the measurement vectors (observations), 

and let Af denote the vector space generated by these measurement vectors. 
Then vector $ is an optimal estimate of X if and only if A" is the projection 
of X onto Af. 

Claim 2 . The vector $ is an optimal estimate of X if ar.u only if the error 
e * X- $ is orthogonal to the observation vectors T, , Y, T , i.e., 


E\(X- = E\eY\ ] = 0, for i = 1 


Claim 2 follows from Claim 1, because if $ is the projection of X onto A/, 
then X- X e A/ 1 . 
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Example I 

In Section 4J, we derived the optimal estimate I as: 

* m C xr C? Y 

when «« had one observation vector only (see Eq. 4.12). Use the ortho* 
gonality principle to derive the same result. 

Solution 

flterVfKJir-f) ]-o 

Snce we know X = AY, where A is to be determined, then 
E ((*- AY) I") = E{ XY - AYY'] = 0 
This is true if and only if 

E(XY*) = C xy = ABJY*) = AC y 

Assuming that the inverse C Y l exists, then it is trivial to see 

A = C XY C Y X 

as asserted. 

Example 2 

Let both X and Y be random variables such that : 
m n = £(y") and E(Y) = 0 
Show the best linear m.$x. of X * Y 2 is given by: 
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Solution 


We know the mean of the value X is: 


m x = E{X)*E{Y 2 )*0 

Thus, our estimate ft shall have the font: 


where we should minimize 


E\X~(aY*b )\ 2 

with respect to 0 and b as in Section 43. However, this approach is relatively 
lengthy. 

Using the orthogonality principle, the solution is much more direct. Let Z 
be defined such that: 

Z- X- £(*)* Y 2 - m t (4.23) 

Z has zero mean, since £(Z) * EX - EX * 0. Now we can use the ortho* 
gonality principle: 

E l(Z - Z) K] = 0 (4.24) 


where 


2*ay 


(4.25) 


Hence, from (4.24) and (4.25), 

E[{Z- AY) Y\ - 0 


which implies: 


A=wn 

w 2 ) 


(4.26) 
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Substituting Z from Eq. (4.23), m get: 


w 2 - *j) n «, 


iw. 


Hence, 


2.2-m-AY-"^Y 

m 2 


From the above, it is obvious that: 


46 UNEAR MEAN-SQUARE ESTIMATE OF 
CONTINUOUS STOCHASTIC SIGNALS 

As discussed in the introduction, Wiener and Kolmogorov formulated the 
problem of optimal separation of signal s(0 from noise #t(f), where the con- 
tinuous measurement Y(r) is given by: 

no •40 + 40 

where both s(f) and it (f) are assumed to be wide-sense stationary processes. 

We shall use the same notation for the ensemble and the process. The 
purpose of the Wiener-Kolmogorov (W4C) theory is to extract the signal from 
the noise, that is, to derive an optimal estimate of s(f) denoted by 5(f), where 
the performance index is as before the mean square. 

Let us consider a more general case that s(r), namely, s(r + a). Let *(f + a) 
be the corresponding optimal estimate and let the error e(f + o) be defined as: 


e(f + o) = s(r + a) - 5[f + o) 


There are three important cases: 

(a) If a > 0, then + a) is called the (optimal) prediction of s(f + a). 
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00 tf a s 0, dm HO is called the (opted) fiber for 40- 

(c) tf a < 0, then 3[r^^is ofled (o^nal) aBoodag of 4* + o). 


4.7 TIC WENBHCOUaOQOIIOV THEORY 

Ik WaaN-Kokogom (W-K) itoqr utilizes the best Baev mean^quate- 
ontimslr criteria applied to siockstic signals in a —aw to be specifkd. The 
V I theory emphamws the tteoriooMiB analysis. The amoothheand- 
pte d te tio n problem sms first mated by Wiener and almost ri— h—eooajy by 
KatatogMOT. To make Wiener fltariag fcsaMe, sane as sm qp ti o n s concerning 
the sipal s(rX the note aft), «d the n—nmnent 

TI0M0 + 4' l) (4.27) 

me aaade. We dal confine owsehes to cm - dimcMi oaal signals thran^not 
this section for the sake of d apBci t y. 


Assame that s (t) and *40 me wide-s easc stationary pro cesses of zero mean, 
such dm ^r) aad n(t) me —correlated, Le., 

£(40 MO] * 0 

Now let ns assume the measurement T(r) is die input of a bear time* 
hwa ri ant system, c ha sacteriaed m the impuhe function M0 (see sketch). 


Yta 


Y # (D 


am 



The outpi. signal Y Q (t) can be written as: 


r ,(0 



MO Y{t- r) dr 


(4-28) 


Note that Y q U) is a linear function of K(*) 


Now the objective is to select the appropriate impulse function (tended by 
M0 such dial we minimize the mean square of: 


£>>)] • f j (T 0 (0 - 4' ♦ «)] 2 } (4.29) 
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where 


(4J0) 


and a is a fixed constant. 

The imp u lse resp o nse 40 that lainwiiw the pcrfonn a nce index gneo^by 
(4.29) gives rise to an optimal solution. The filter with impulse response *(•) 
is caled the optimal ffltet. 

£^(0J can be obtained in terms of covariances *(0 and 40, since 

ffk2(Ol-ff|fv 0 (rt~s(f ♦•)!*} 


« ffl^Ol ♦ Etfit ♦ «t» - 2Cir # (t) s (f + n)J (4J1) 
The lust term of the above can be written as: 


E[r J(0I •£ 



40 A(0) t\t -r)r(t-o)dr do 


(4.32) 


Assume that the expected value can opentt inside the integrand. Then, 
utUumg the property of the stationarity, we can verify: 

£ir(f - o nt - 0)1 * e mo n0)i - r y (t - o) 

» E {(s(0 + 401 !*(«) + 4«)1 } 

- /?/r - o) + R n (T - o) (4.33) 


Hence, 





40 4«) I^,(f ■ 0) ♦ R„(t - o)J dr do 
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(4.34) 



Similarly, we can verify: 


£[r 0 (f) s(t * a)J Hr) Rp * a) dr (4.35) 


and remembering that Rifi) T r (* 3 (r + a)] and substituting this and (4.34) 
aid (4.35) into (4.31) yields: 


n e£<0] « Rjm -2 J A(t) R t (r + a ) dr 


a: 


N(t) 6(a) [R 5 (r - a) + R n (r - a)\ dr do 


(4.36) 

The above equation demonstrates that die optimal solution depends on the 
autocorrelation (covariance) functions only. It should be emphasized that this 
is an extremely important result, because the optimal filter h(t ) is obtained 
from the knowledge of /? f (*) and R m ( m ) and not directly from s(f) and n(f). 
Hence, there are infinitely many signals that give rise to the optimum solu- 
tion, all having the same autocorrelation (covariance) function. Wiener mini- 
mized £[e*(f)] given by Eq. (4.36) via the calculus of variations; we shall use 
the orthogonality principle given by Theorem 1. We can now state the solu- 
tion for the optimal filter by the following theorem. 

Tneorem 3 (Wiener-Hopf) 

£[e 3 (f)J given by Eq. (4.29) is minimized if and only if h(t) can be 
obtained from the solution of the equation: 


R s (r + a) = R sy (r + a) = J h(a t R y (r - o) da 

= J' h(a ) - o) + RJ t - o» da 


(4.37) 
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Thus, the optimal solution 3(f) is given via: 




» J nx) %{t -\)d\= J K(X) Y(t -\)d\ ( 4 . 38 ) 


Equation (4.37) is known as the Wiener-Hopf equation. 

Proof 

We have proven the orthogonality principle for the discrete case. In what 
follows we shall show that the solution of Eq. (4.37) is equivalent to the 
solution of: 

E[e(t) K(0)1 = E[(s(t ♦ or) - 3(f)) Y(9)] = 0, for all 0 < f (4.39) 


where 3(f) is given by Eq. (4.38), 0 * f - r with -*> < r < °°. Let us use the 
notation 3(f | lf) as the optimal estimate to Eq. (4.29), given the observation 
JXO over (-» f] , where f { * f + a. 

To prove (4.39), let V be the space generated by the random variable 
{«(',)[. Let QC y be a space generated from j Y(t)\ given by elements: 




r) T(t) dr 


where h { •) is a continuously differentiable function. Utilizing Theorem 1, the 
norm (mean square) 


is minimized if q = Ps € Q and from Claim 2: 

EKSU t I r)-*))^'.)] =o 


which yields: 


E[e{t x If) q{t x )\ 



R eY U j “ rl /) /r(/j -r)</r = 0 


which proves the orthogonality condition. 
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4.7.1 Discussion 

The Wiener-Hopf equation (437) will provide the solution for fi(0* How- 
ever, obtaining £(/) from the integral equation is extremely difficult. Assuming 
the observation T(/) is available over the interval (-<*>, /), we can utilize the 
frequency domain approach to solve for £(f) by obtaining ffijco). 

It turns out that A(f) does not correspond to a causal system (realizable), 
since, in general, H(t) is non-zero for t < 0. The condition of realizability is 
given by the Paley -Wiener condition (a sufficiency condition) which states that 
a system with the transfer func 

/' 

+* — oo 

The linear system described above will, in general, violate condition (4.40). 

If we drop the condition of realizability for the moment, we obtain (to be 
proven) /?(/o o) as: 


tion //(/w) = is realizable only if 


|l»l //(/co)l| 
1 + to 2 


d(t) < 00 


(4. t0) 


tf(/w) = 


S s y( w) exp Qua ) 


S s ( u) exp (joja) 


(4.41) 


A A 

Hence, h(f) can be obtained as the inverse Fourier transform of Hijoo). Thus, 



S s (u) exp I/w(/ + a)] 

V<o) 


(4.42) 


Let C denote E\e^(t)\ and C° its minimum over all A(-). We shall also prove 
that: 


C° 



5 (w) S„(w) 


(4.43) 


Remark 2. If s(t) and »(/) are uncorrelated, then 


R s y (t) R s (t) S sY (u) = 5 y (w) (4.44) 
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Remark 3. Utilizing the orthogonality principle (see Eq. 4.39), we can 
verify that: 


C° « e\W) - KOI 2 } = E\s\t)) - £p*(0] 
-£{[s(0-K0] K0| 


Thus, 


C° 



R sY (t) Kr) dr 


(4.45) 


Theorem 4 

The optimal transfer function %u) corresponding to the impulse response 
is given by Eq. (4.41) and C° given by Eq. (4.45) is the minimum (optimal) 
performance index. 

Proof 

From the Wiener-Hopf equation, we have: 


R sY (T + «) = R s (r * a) - 



h(o)R y (T- a) da 


Now let us take the Fourier transform of the above: 




exp (jcoot) S sY ((a>) = I rt jr (r + a) exp (~jurr) dr 


■a: 


h(o ) R y (t - a) exp (-/tor) do dr 


/ h(0) exp (-jojO) dO f R y (X) exp (-/gjX) d\ 

oo »/_oo 

#(/u) S y (w) 
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Thus, 


/fl/w) 


S sY (o>) exp (J<jm) S s (ui) exp (joxt) 
S y (w) = 5 y (w) 


as asserted. 

To prove Eq. (4.43), let us calculate C = £>*(/)] via the frequency domain. 
From Eq. (4.36), we know: 


C = R s ( 0 ) - 2 J h{r) R s (t + a) dr 




+ / / /i(r) fc(a) /? y (r “ a) dr da 


Thus, C can be rewritten as: 


C = ^ y 5 s (co) d ( A ) - J h(r) J S/w) exp (/cj(t + a)] do > dr 


5 //*«*»/ S s (a>) exp (-/cj(a - r)] c/a; c/a c/r 


1- fs s (o>)do>-±; f h(r) exp (/cor) dr ^ £ s (aj) ex P (fa**) 


+ ^T / A(t) exp (/wr) Jr / /i(a) exp (-/wa) ^ £y(co) f/co 

^ |s s (cu) [I -2H*(ju) exp (/wa)l ] + I /fl/w)l 2 5 y (w) | </w 
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Thus, 


C*^ f j^(to)[lexp (/wa) - //(/ w )l 2 )+ l//(/oj)l 2 S n (u)j du 


Now if we substitute H(jcS) fiom Eq. (4.41) into the above equation, we 
obtain: 


■*/[ 

■s/ 


exp (juxt) 


exp Qua) S (u) 


S Y ( w) 


5 J(u) $„(«)“ 

g ~ / J 


5 s (<o) 5 w (cj) 
S„(fa>) 


t/w 


The proof is now complete. 

Example 3 

Assume that the signal s(t) and the noise n(t) are uncorrelated and that 
they are both of zero mean. Let 


S s (o>) ■ 


1 

1 + ( a ? 


and 


V w > a 1 


Obtain the optimal estimate s(t) of s(t + a). 


Solution 

Since the noise and the signals are uncorrelated, then S Y (u) - S (w) + 
5 B (w). If a * 0, then 


1 


HQu) = 


1 + (A) 2 


I ( a ) 1 


_J 

2 + to 2 
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and 


h{t)=j±=tx pl-V2\t\] 

For prediction and smoothing a ^ 0, then 

Hijcci) = — - — - exp (juxx) 
2 +w 2 


and h(t) is its Fourier transform. 


4.7.2 A Very Important Remark 

Although the optimal impulse response £(r) corresponding to $(f) is not 
realizable, it can be solved mathematically. We have solved for jft(jco) by 
utilizing the frequency domain analysis, where £(/) is the inverse Fourier 
transform of /?(/u>). We should emphasize that the solution was possible in 
closed form (see Eqs. 4.41—4.43) by making some significant assumptions: 

(a) First, we assumed that the measurement F(f) passes through a time- 
invariant linear system (filter). 

(b) The measurement of the observation K(f) was available over the semi- 
infinite interval. 

Assumptions (a) and (b) were made so that we could utilize the frequency 
domain approach to solve the complicated Wiener-Hopf equation. 


4.7.3 Wiener-Kolmogorov Theory for the 
Time-Varying Case 

It should be emphasized that the Wiener-Kolmogorov theory does not have 
to satisfy assumptions (a) and (b). In that case, the optimal linear system will 
be time-varying, and we would not be able to use the frequency domain 
analysis to advantage. 

The W-K theory for the time-varying case assumes the availability of the 
observation Y(t) over the finite interval Now we will seek a time- 

varying impulse function h(t , r) such that (see sketch): 


at/) 


■/' 


h(t, t) Y(t) dr 


(4.46) 
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where 


C° ■ min £■(«*(*)] *£{[s(f + <*)- s(f)J 2 } (4.47) 


ovet all h(t,r). 



We now state a general theorem concerning the optimal solution. 


Theorem 5 (Wiener-Hopf) 

The optimal solution s[t) given by Eq. (4.46) is obtained if and only if 
h(t, r) is solved from: 


■/ 


R sY (t - a) = j fi(t, a ) R Y (o - a) da 


(4.48) 


Proof 

The proof of tb Wiener-Hopf equation given by (4.46) is equivalent to the 
orthogonality principle: 


as already discussed by Theorem 3. The proof is identical to that of 
Theorem 3 with the only difference being that the integral limits are from t Q 
to t and h(t - r) •« replaced by hit, t). 

Note that if the power spectrum* of n(r) and r(f) do noi overlap (see 
sketch), then S s (u) S n (w) ■ 0 and from Eq. (4.43), we gf 

C° = 0 
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Thus, thire is no error in the system. Hence, we can separate the signal and 
the noise perfectly. 


4.8 OPTIMUM CAUSAL SYSTEMS 

Now we shall seek an optimum system which is constrained to be physi- 
cally realizable, i.e., the impulse response should be A(X) = 0 whenever X < 0. 
Thus, from Eq. (4.38): 


S(r) = 



£(X) Y(t - X) dh 


(4.49) 


that is, s(t ) is not a function of Y(t - X) for X < 0, which is not available, 
since ft(\) a 0 for X < 0. The upper bound is °°, since the observation over 
the interval (-«>, r] is avail; ble to the estimator. 

Without any loss of generality assume that a = 0. Then the orthogonality 
principle is: 


£{|s(r) - s(/)l Y(t - r)} =0, for 0 < t < «> (4.50) 


and its corresponding Wiener-Hopf equation is: 



ft(o) R y (t - o) da 


(4.51) 


or 


*, y (r) = /ijyfr), for all r > 0 (4.52) 


(see Eq. 4-50). 

Let q{ t) be defined by: 

q(r) = R sY (t) - R tY (r), for all r (4.53) 
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Note that far all r > 0, f(;) * 0. Taking the Fourier transform of the above 
yields: 

CM « S y M - ^(w) = S a y (w) - M/w) S y («) (4.54) 

assuming 0(w) exists. Now replace oj * s/j in Eq. (4.54) to get the bilateral 
Laplace transform: 

ft*) * S tr {s) - ^(s) = S tr (s) - /**) S r (s) (4.55) 


We have already discussed the fact that the inlateral transform fl[i) of any 
absolutely integrate function /(/), for t > 0. will have poles in the left-half 
plane (LHP), and, for / < 0, will have poles in the right-half plane (RHP). 
Thus, 0 (j) cannot have LHP poles since q(r) - 0 for all r > 0. We know S y (s) 
is an even function of s ; let us decompose it as follows: 


S r (s)*S;(s)S y (-s) (4.56) 

Where S^(s) will have LHP poles and S'(-s) will have RHP poles (that is, 
$ y (s) is analytic in the RHP and S y (s) is analytic in the LHP). Using 
Eq. (4.56) in Eq. (4.55) yields: 


m = S sr ($) - fks) s;(s ) S'(s) ( ’ 57) 

From Eq. (4.57) we obtain: 

^ 0(5) 

H(s) SUs) * — — * (4.58) 

Sy(s) S'(-s) 

We observe that H(s) S y (s) has its poles in the LHP and Q(s)/S~('s) has all 
its poles in the RHP. But 5 jjr (s)/5“(-s) has poles all over the complex plane. 

Let 


S r (s) 

C(s) = -^— (4.59) 

$;(-*) 
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The putial fraction expansion or G(s) can be decomposed at: 

Gfe) «G,(s) ♦<?,<») (4.60) 

where G,(s) wiH hate LHP poles and C 2 (s) wfll have RHP poles only. 

Now choose (see Eq. 438): 


«W 


G,(s) 

S*(s) 


(4.61) 


Thus, ff(s) great by Eq. (4.61) it the solution of the Wiener-Hopf equation. 
The above solution is due to Shannon and Bode. 

The following examples are taken from reference (8) . 


Example 4 

Let s(t) and n(/) be stochastic signals of aero mean, such that: 
*,(r)-Jexp(-lrl) 

*,(t)*«t) 

and 

£{no«vol *o 

Let us derive an optimal 3(/) of s(t) over (-<*>. r). 

Solution 

From Eq. (4.58): 


«(s)S;(») 


s;(-s) 


m 

$"(-*) 
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from <4.61 ): 


A»> 


G,< ») 


**t* 6,W «■ correspond to the LHP poles of S jr (iVS^-i), upon portal 
fraction npowie , we get: 


S, r (w) * S,(u) * — 

I + w* 


A j. 

I ♦ or 


The bibienl Laplace transform corresponding to 5^((o) is: 


V*> 


(2 ♦ »)(2 - s) 
(I ♦*)<!-*) 


or 


s;u) 


in 

1 *s 


and 


5 i (l) _ 3 /(l - »*) 3 I I 

(2 -s)/(l -*)"(! ♦ s)(2 - s) ~ I + s 2-s 


Hence, 


C,(*> * 


1 

2 + * 


PT7 and ff(s) 
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A(f) = exp (-2/) 



Sac sketch 


nn 


to 





So me to o «e designate to ptoccdus c by to block dngsana: 


VM r 

1 

to ^ 


*♦« 



The filter an too be w ri tt e n ns a differentia! equation: 

30«-2*0+n0 

Example S 

Let s(0 and n(t) be pven such that: 

®,W E np(- •*■!) 

*>>- ® 

R u (t) * exp (-2 Irl) 

where s(0 and n(0 are of zero mean. Let K(A) * ](X) + n(X) be given on 
/J. Find the optimal estimate 3(0 of s(0- 


S, r M = S,(w) = 


2 

I + w 2 


and 


$,(«*>) = S/«) * S B (w) = 


6(2 + cj 2 ) 

(1 +w 2 )(4 + w 2 ) 
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Now the bilateral transform corresponding to 5 r (u) is obtained by inspec- 
tion: 


Sy(s) 


y/& (>/2-H) 
(I *s)(2*i) 


Abo, the partial fraction expansion of S f (s)/Sy(-s) must be obtained: 


S s U)lS Y (s) 


mi - « 2 ) 

y/6 ( y/T- *V0 - *)(2 - s) 


= C,(s)+C 2 (j) 


where 


C,(*) 

\/6 1 

jl\ 

*1+ y/2 \ 

1 

after partial fraction expansion. Thus, 


~ / 

' 1 \l 

2 + s \ 

m = \ 

il+ y£l\ 

y/2* s ) 

The optimum filter is given via the figure. 




] 7M 
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4.8*1 Optimal Prediction and Smoothing 

We have thus far obtained the optima] estimate s(r) of s(/) given Y(r) on 
the interval (-® 0 , f], i.e., we have derived the optimal filter. Remember that 
s(r) is the output of the linear system with the impulse response £(/) and the 
input K(0* Suppose we are interested in estimating s(/ + t Q ) based on the 
same observation >V) on (-®°, r], where t Q > 0. This is called prediction. 
Before obtaining the optimal predictor s(*), let us generalize the estimation 
problem somewhat. 


Let s(f) and n(f) be as before, i.e., they are zero mean and wide-sense 
stationary such that: 


RJt) 

sn 
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Define 




J *(X)j(r- X)dX 


(4.62) 


where g(-) is the impulse response of a time-invariant system. 
Now let us minimize 


c=fiwu)- w 0 {t )] 2 


(4.63) 


where W 0 (r) is the output of the filter with impulse response h(t) and input 
Y(r), and M # ) restricted to be causal. The optimal solution W(f) is thus the 
output of the system with impulse response Jt(f) and the input Y(t ). Equa- 
tions (4.62) and (4.63) are the generalization of the filtering problem. For 
example, if gV) = 6(0, then K{0 = s(f ). 

If g(0 * $(f ± / 0 ), f Q > 0, then W(0 = s(t ± t Q \ which corresponds to the 
observation T(X) over the interval (-°°, t ). 


If f(0 * exp ( or), t> 0, then 





exp (-aX) s(f - X) d\ = 



exp (-tf(f - X)] s(X) dk 


(4.64) 


Let ® (/<o) denote the Fourier transform of g(t). Then following the same 
procedure as from Eq. (4.50) to (4.61) we obtain: 


<S(*)S sY {s) 

Sy( S ) 


= G I (s) + C 2 (s) 


(4.65) 


where (4.65) is a generalization of (4.60). 
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Now f/(s) is given by: 


m 


C,(») 


(4.66) 


Remark 4. In Examples 4 and 5, ®(s) = 1. 

Example 6 

Use Example 4 to obtain the best estimate of s(t + f # ), f # > 0. 


Solution 

g(A) = fi(X + f 0 ) or ®(r) = exp (t 0 s). Thus, as before. 


* 


2±s 

1 + s 


Now, due to the factor exp (/ # s), the decomposition of ^S(s) S jy (s)/SjX' s ) 
is given by: 


S'(-s) 


* G t (s) + (7 2 (j) 


However, let us derive the portion of the function ( &(s) S g y(s)ISy(-s ) cor- 
responding to / > 0 or C, (s). Thus, 


C7 t (s) 


gxp (-f 0 ) 
l + s 


therefore, H{s) is then given by: 




gxp (-t 0 ) 
2 + s 


For smoothing, the results are similar. 


139 



4.9 MATCHED FILTERING 

In laser and radar applications, when a system is used to detect a target, 
the form of the signal must be known. However, often the signal is con- 
taminated by additive noise. A good criterion for estimation could be the 
signal-to-noise ratio (SNR), which we would be interested in maximizing. 

Now let us assume that s(f) is a deterministic signal such that its Fourier 
transform (denoted by S(w)) exists. Let S n (co) be the power spectrum of the 
noise contaminating the signal. Let both the signal and the noise pass through 
a time-invariant system with the transfer function /f(/<o), and let denote 
the output corresponding to s(f) with Y n (t) the output corresponding to n(f). 

Suppose at t - f t we are interested in maximizing 


ggi) 


(4.67) 


Y *(t) is the output power of the signal, and wc know that E( V )) is the 
output power due to noise. We can write Eq, (4.67) in terms of the frequency 
parameter. We know that: 





h(t - r) s(t) dr 


(4.68) 


and 


Y n (t) 



h{r - r) n(r) dr 


(4.69) 


Also note that: 


S Y (to) = l//(/co)l 2 S(u>) (4.70) 

1 n 

and 

* //(/cj)5(co) (4.71) 


140 



Thus, from (4.71), Y g (t) can be obtained as JF 1 of H(ju)) S(co), i.e.. 


r£ 


Y,(t) = I ffljui) 5(cj) exp (jut) du> 

s ZTf " 


(4.72) 


and E\Y"{t)\ as the of S y (w). Thus, 


E IT (0) 2 ] = £(r (f) 2 ] 


.±r 

2nJ 

_ Ct 


l//(/w)l 2 S„(w)dw (4.73) 


If we are interested in maximizing the SNR given (4.67) at f = f,, we must 
maximize: 





H(jwi) exp (/w/j ) doj 



\H(jco )\ 2 S n (co) du> 


(4.74) 


We now state and prove the following theorem. 

Theorem 6 

The maximum value of the signal -to-noise ratio p given by Eq. (4.74) is 
obtained if: 


H(ju>) = k 


S'(u) 


exp (-/wf , ) 


(4.75) 


where k is a constant. Before proving the above, we note the following: 


The intuitive concept of Eq. (4.75) is obvious: The filter should pass those 
frequencies for which the amplitude spectrum of the signal is large compared 
to 5 n (co), which is the power spectrum of the noise. 
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The special case where S n ( cj) is constant, say ,^ 0 * is very important, i.e., 
white noise. In that case Eq. (4.75) becomes: 


%) = exp (-/W/ ,) 


(4.76) 


The factor k/^ Q is gain, which we shall assume is unity without any loss of 
generality. Since the transfer function that maximizes p is given by the con- 
jugate of S(co) (and exp (-/wf,)), the filter //(/w) is called the conjugate 
filter. However, a more popular definition is the match filter, since H(jo)) is to 
match S*(co) exp (-/cat t ). 

Proof of Theorem 6 

The proof is relatively simple. Using the Cauchy-Schwarz inequality: 


dcJ < f\f(o>)\ 2 du> f\g(u)\ 2 (ho (4.77) 


we set: 

/(w) * H(ju) [S„(cj)] ,/2 
and 


* 


S(w) exp (jiot j ) 

P„M ] ,/2 


The left-hand-side, when divided by the first integral on the right, is simply 27rp, 
which implies: 



lS(<o)l 2 

W 


do) 


(4.78) 


As a consequence of the Cauchy-Schwarz inequality, if /(to) = kg*( a>), then 
we shall have the equality in (4.77). Therefore, p becomes maximum if: 


H(jio) = k 


S\o>) 

*.<"> 


exp(-/w/,) 


Thus, the proof is completed. 
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4.10 KALMAN-BUCY FILTERING 

Before discussing Kalman filtering, let us review some basic concepts 
needed in the discussion. 

Definition 3 

A continuous Markov process X(t) for t ^ f Q is a process that, for every 
r<t y 


f(X(t)\m. for X € [t oy T})=f(X(t)\X(r)) (4.79) 


where X can assume any value in the interval f 0 < X < r < r. For the discrete 
case the definition is similar. Let t % , t v . . . , $ be such that: 

t O<*l <t 2 <"'<'* ( 48 °) 

and W*)} be a discrete set of random variables taking on the values from 
, • Let us use the notation X(i) instead of A\f f ). We can now define the 
discrete Markov process. 

Definition 4 

The process {Aft)} is a Markov process if for every n such that (4.80) is 
satisfied, we have: 

f(X(n)\X(0)>X(\) y ... $ X(n- 1)) =f(X(n)\X(n - 1)) (4.81) 


Now utilizing: 

f(X( 0), *(1), . . . , X(n)) = f(X( 0), JT(1), . . . , X(n - \)) f(X(n)\ X(0) 9 

*(1), ...,*(n- 1)) 

and continuing in this manner, and making use of definition (4.81), we get: 


f(X(t 0 )> X{t x ),..., X{t n )) -/ W0»/(jr(r,)l^(0» ••• f(X(n)\X(n- 1)) 


=/(*(0))fl /(*(/)!*(/- D) 




(4.82) 
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Hence, the Markov \ ocess is defined by the conditional probability density 
functions f(X(i)\X(i - 1)) for / = 0,1 n. The Markov process is funda- 

mental to Kalman-Bucy filter development. 

As already discussed, a linear system can be characterized via the classical 
method using the impulse response or the modem approach using the state 
variable approach. Kalman-Bucy filtering relies on the state variable charac- 
terization, where the state is a Markov process. 

The reader is assumed to be familiar with the simple state variable represen- 
tation. If this familiarity does not exist, the reader should consult Appen- 
dix E, which contains a simplified discussion of state variables along with 
some examples. That appendix is sufficient for our purposes. 


4.10.1 Continuous Kalman-Bucy Recursive Filtering 

We shall briefly discuss the continuous version of Kalman-Bucy (K-B) filter- 
ing. The most important part of K-B filtering is the fact that estimation is of a 
sequential nature (Markovian). We shall discuss K-B filtering for linear systems 
unless specified otherwise. 

The state variable characterization of a linear system can be generally written 
as: 


X = /1(f) X(t) + B(t) 1/(0 (a) 

Y(t) = C(t) X(t) + D(t) l At ) (b) 


(4.83) 


where -Jf(r) = (3f,(r), . . . , A' n (z)) where the prime denotes the transpose, U(t) 
is a p X 1 matrix, and Y(t) is a q X 1 matrix. A(t),B(t), C(t ' . t) are matrices 
of order nX n, nX p,q X n, and qXp, respec'ively. 


Example 7 

Let a time-invariant system be characterized by the following differential 
equation: 


£rn t 2 ^m t 3 m!i tr)mlu 

d, a j , 1 d > 


where Y{t) is the output, U(t) the input. 


(4.84) 
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Define the state variables as follows: 


*,(') = no 

(4.85) 

r .dm 

~ dt dt 

(4.86) 

X (r\- dX 2 (t) - d * no 
3(0 " * " dt' 

(4.87) 

Equation (4.84) can be arranged so that the highe^t-order 
appears on one side of the equation. Thus, 

derivative term 

dt 2 dt 2 dt 

(4.88) 

Substituting (4.85) (4.87) into (4.88) and utilizing the defining relations of 

the state variable into (4.88) yields: 

x t = x 2 (t) 

(4.89a) 

*• 

K» 

II 

(4.89b) 

x 3 = -jf,(0 - 3* 2 (0 - 2AT 3 (0 + 2 U(t) 

(4.89c) 


The system described by (4.84) can then be defined by the state variable 
representation of the form (4,83). Thus, 
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C = 11 0 01 


0 = 0 

The solution of X(t) is given by. 


X(i) = 4>(/, X(t Q 



<Ht, t) B(t) 


(4.90) 


where 


— __2- = / 0 ) (4.91) 

t Q ) * / (identity matrix) (4.92) 


4>(/,f 0 ) is called the transition matrix, which is a matrix of order n X n. 
Furthermore, it can be shown that (see Appendix E) the following n* itions 
hold: 


♦"W *'#•'■> 

(4.93) 

*(f 2 ,t 0 ) = 4^/j./,) 4>U,,/ 0 ) 

(4. >4) 

and 4> is a nonsingular matrix. 


In a time-invariant system (A, B, C, and 0 are constants), the transition 
matrix <t>(M 0 ) takes the form: 

= exp {A • U - r Q )} 



where 


exp {A •/}=/ + At + 


aV 
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A general diagram of the system given by Eq. (4.83) is given in Figure 4-3. 



Fig. 4-3. State Variable Configuration 

The continuous Kalman estimation requires a linear system model of the 
form: 


X * A(t ) X(t) + B(t) U(t) (4,95) 

Y(t) = C(t) X(t) + HO (4.96) 

where X(t) is assumed to be a random process, an n X 1 matrix, U{t ) a 
random noise of zero mean, a p X 1 matrix, KO is a random noise with zero 
mean and a q X 1 matrix uncorrelated with W(/). A(t), B(t) y and C(r) are 
matrices of dimensions #t X n, n X p, q X #i, respectively. The observation 
signal Y(r ) is contaminated by the additive noise process v(f). The most im- 
portant property of Kalman estimation is the fact that a differential equation 
technique developed to solve tne optimal solution has the property that it can 
be synthesized in a recursive manner because the differential equation tech- 
niques are in most instances equivalent or very closely related to recursive 
techn* ues. That is, the estimate at one point does not need the processing of 
all the measurements, but omy the information stored by the point preceding 
it. 


Let us assume the foDowing statistical moments: 

ElAt) = 0 
£*f) = 0 

V« 2 )~QbU 1 ~ ',) (4.97) 

l x ) 

EUU t )v'(t 2 ) = 0 
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where Q and £ are of dimensions p X p and q X q 9 respectively. These 
matrices are generally functions of time /, and $(r 2 - f ( ) is the Dirac delta 
function. The functions (At) and K') *ie white noise terms with respective 
covariances K and £. 

The Kalman recursive problem is one in which we are given the observation 
values (continuous measurements) of K(r), t Q < r < r, and it is desired to 
find the estimate at time t § denoted as Jf(r B 1/) or X(/ ( ) having the form- 




f A(f. r) t^Jr 


where /i(/. r) is the impul** response of a linear system with the input >'<•) and 
the output 3f(*) minimizing 


£!*(/,)- XU t I /)]' *!*</,>- XU t 1 1)\ = || XU t )- X(t t i/H! 2 qm (4.<)8) 

where W is any n X n positive semi-definite matrix (it can be shown that the 
minimization of (4.98) is independent of W. 

The state estimation problem can be divided into three classes: (1) filtering 
if t-t v (2) prediction if f t > r, (3) smoothing if /, < /. 


FOtering 

The optimal solution is given in Kalman's original work. We know /) is 
the optimal solution if and only if it satisfies: 

t:\XU) - XU\ r)l Y ir) = 0. for 0 < r < / (4.99) 

which is the orthogonality principle: without any loss of generality we have 
assumed t Q = 0. 

Since we expect the optimal solution to be a combination of the and the 
measurement K(r), we make a guess that .V(r| /) is the solution of the differential 
equation 


* = /,</)*(>) + /y/)K(/). Aoio) = o (4.100) 

where / r J (r> and F 7 U) are chosen such that the orthogonality condition in 
(4.90) is satiefied. We know that if the orthogonality condition is satisfied 
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the solution must be optima! (unique). Thus, if F ( (/) and f" 2 (/) could be found 
such that Eq. (4.90) is satisfied, then the jfl(*) corresponding to these F % U) and 
Fj(f) must be optimal. 

lndec J, it can be shown that the solution of the form given by Eq. (4.91) 
satisfies the orthogonality principle. The solution is quite tedious. Let us state 
the results via the theorem. 

Theorem 7 

The optimal K-B filtering estimate X{t) is the solution of Eq. (4.91), where 


- F 2 U)CU)\ (4.101) 

and 

F 2 {t) = P(t)C'(t)L l (t) (4.102) 

iAt) is given by Eq. (4.97) and /(/) is given by: 

pu )= e { wo - o] wo - n»ior I ( 4 . 103 ) 

and can be obtained as the solution of the nonlinear differential equation 

P = AP + PA’ -PC* L x CP + BQB ' (4.104) 


with the given initial condition i\0) = £\JT(OIO) A(0I0)*). Note that we have 
dropped the argument / for convenience. The proof will be given later, but we 
dial! first give an example. 


In Figure 4-4, uie optimum continuous filter is diagrammed. The input to 
the system is the observation T(f)jvhich is the contaminated signal and the 
outputs could be considered as X(t\t) or c£(flr), where CX(t\t) is the 
optimal estimate to Y(t). 
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Rg. 4-4. Optimum Continuous Fitter 


Example 

An object moves with an unknown constant velocity V on a straight line 
trajectory. Suppose we observe the projectile at the initial time t Q = 0 at a 
known point s(0) as shown by: 


| > TRAJECTORY 

stO) 


Thereafter the projectile is tracked for r seconds. The observation consists of the 
displacement from the origin which has been contaminated by additive white 
noise of spectral density jV 0 watts/hertz. Let us assume the velocity is a zero 
mean Gaussian random variable with variance o 2 . Let us find the Kalman filter 
yielding the optimal linear estimate of V. 


Solution 

Since the speed is constant V = 0 and the observation T(f) by definition is: 


YU) = s(/) + "(f) = s(0) + rr + rt(f) 


If we let K(/) = T(r) * s(0), the dynamic system becomes 


V = 0 


YU) = tV + nU) 
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Thu$,i4 = Zf = 0,C=f,andI from which 


P^F^tnru)- tP\ 

P * -P 2 (‘)t 2 /* 0 

F 2 (t) = P(t)t/JV 0 

The initial conditions are 1^0 1 0) * 0, fl(0) = a 2 . 

To solve for V(t)* we need to obtain F At) which in turn requires the solution 

of m. 


fPU) . ft 

I p~ ^ dp = _ _L_ r 

** P(0}=O 2 ® 0 


t 2 dr 


from which 


no 


*v 2 

3^+a 2 / 3 


Thus, 


^ r 3 ° f - , 

P(o* | — iwwnrii#. o«r«r (4.105) 

J 0 3/V +o 2 f 3 


For the special case that ITT) *♦ constant as from (4.96) we get: 


3o 2 ? 

3W 0 + o 2 f 3 




for large f 


*We shall denote by V(r ). 
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Remark J. In filtering we shall often write ft(f) instead of P(f 10 or 


P(D « + a K+ <T)^m 


which implies 


V{T)^ V, asr->°° 

That is, if the contaminated signal is observed for a long time, we should get the 
exact estimate. 

Example 9 

Let the observation Y(t) be given by: 

T(0 = d cos (w Q / - 0 O ) + HO (4.106) 

where d , oj q , # 0 are, respectively, the amplitude, carrier frequency, and phase. 
Let HO be a white noise process with a variance of unity. Assume that <o 0 , 0 Q 
are known exactly. Estimate d l 

Solution 

Since d is constant, then d = 0. Now, we can have: 

X = 0 

X(0) = *( 0 =<* 

Y(t) = cos (w 0 f - 0 Q ) X + v(t) 

Hence, A = B = 0, C = cos (w 0 f - 0 O ), Q - 1 and L - \. From Eqs. (4.100) to 
(4.103): 


>=- [cos(w o r- 0 o )/Kf)i 2 (4.107) 

F t U) =-F 2 t) = - cos (w 0 r - 0 o ) IV) (4.108) 
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Thus, 


X = F 1 (0^f) + F 2 (r)K(f) 

where F % (t) and F 2 (t) are given by the previous equations. 


(4.109) 


The solution ${t) requires the solution P(t) from tq. (4.107). It is apparent 
that even for the scalar case, the solution can become fairly tedious. 

Remark 6. Note that $(;) is the estimate of Jf(r), given the observation T(f). 
The corresponding uncertainty (covariance) of %(t) is given by P(t). Since, 


P(t)-E\ee \ = E\[X(t)-X(t)) 
=E\[X(t)-X(t)]’ [*(/)-*(/)] | 


= E { (*(/) - £*(r)] l*(r) - £*(/)] ’ } = cov X(t) 

for the case of the unbiased estimate, then F(0 is indeed a covariance. 

Example 10 

In the previous example suppose d is known perfectly and it is desired to 
estimate w Q , and 0 Q . Obtain the model and the form of the solution. 

Solution 

Let 


X(t) * d cos(w 0 /‘ 0 Q ) 


Then 


X(t) = -du 0 sin (co 0 f - 0 Q ) 

Now if we define *,(0 * X(t) and X 2 (t) * X t (t) = X{t ), we have: 
= -</<o 0 sin (w Q r - 0 Q ) = X 2 (t) 

*2 ‘ ~ d w 0 C0S (c V ■ V = “ W 0 X l^ 
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so that 


d_ 

dt 




Y(t) 


■ [■ •] 0 


+ KO 


Thus, by inspection: 


■l :] 


B = 0 


C - 11 01 


£> = 1 


(4.110) 


(4111) 


Now the solution is more involved and the estimate X(f) of ^(f) with its 
covariance P{t) can be obtained as before. 

Example 1 1 

This example is taken from re .ence [8] . Assume that T(r) is a white noise 
process with unknown mean X. Thus, 


EY{t) = X 


(4.112a) 


£* |[rcr,)- A-J |T(t 2 )- A'J { = L6(t 2 - /,) (4.112b) 


Suppose we want to estimate X when the observation Y(t) is received over the 
interval (0, fj. 
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Solution 

Since X is constant, we can construct a model as follows. 

X = 0 

y = m+H.o 

Ev(i.)v(t t ) = L 8(/ 2 - f,) 

From Eq. (4.10S), we get: 

P(t) - L /» 2 (f) 
or 

p 2 o> 

Integrating both sides yields: 

~r l u) m -L~ l t r — 

/- c 

where c is a constant. 

However, at f = 0, we get: 

P(0) * . ~ 


Thus. 


no = 


i 

L ~ 1 t + P(0) 


(4.113) 
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Now, substituting Eq. (4.1 13) into Eq. (4.100), i.e., 

X = F x ( t ) X(t) + F 2 (/) Y(t) y X(0) = 0 
where, from Eqs. (4.102) and (4.103), 

F l (t) = AU)-F 2 (t)at) = — -J- 

r + noj 

F 2 U) = P(t)C ' !-'(/) = *-j— 

,+ P(oj 

we obtain: 

£ = - - 1 I - X U) + - -■ 1 - >m *(0) = 0 
' + P(b) ‘ + w) 


From the above equation, the transition matrix <f> (r,0) is given by: 


4 >(/. 0 ) = — S9j ■ 

'*W) 


This is true because 


<i> =- 


4>, 4>(0,0) = 1 


t + 


P{0) 


Equation (4.14) can he solved by using Eq. (4.^0). Thus. 


0 

T + 


XU) 


- [' ---£» -f- K,r)</r 

' * w) m T + im 


(4.114) 


(4.115) 
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Simplification of the above gives rise to: 


X(t) = 




Y(t)cIt 


Since both L and /'(O) are constants, we obtain: 


(4.116) 


X(t) = lim - f Y(r)dT 

r Jo 

which is expected. Thus, for a long observation, X(t) becomes independent of 

m- 

Now we shall prove Theorem 7. 

Proof 

We can extend the general Wiener-Hopf equation given by (4.48) to the 
case where the signal s(f) is changed to the vector X . Then the cross correla- 
tion function R gY (t - ft) will be simply changed to R XY (? * <*)• 1*1 us also 
assume that the mean of X and Y is not zero. Then we will change 
R x Y (t - a), and R y (o - ft) to C XY (t - ft) and C y (a - a), respectively. Thus, 
the generalized Wiener-Hopf equation becomes: 


C XY (t - a) 



G (f, o) C Y ( o - ft) da 


(4.117) 


where G(t , a) is the generalized impulse response. 

The above equation is equivalent to the orthogonality condition. Let us 
take the left-hand side derivate of C^*) 1° 8 et: 

-C^t - or) = {[*(/) - t. [*(0] ] in«) - £-[y(a>] ]'} 

= £•[^*(0 V»] - E [If] £•[>»] 

= E \(AX + BU) K'(o)l ~E \AX + BU] E [K'(o)] 
•AWC^t- a) + B(t)C UY (t - a) ( 4 . 118 ) 
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The above equation was obtained by using Eq. (4.95). Since U(t) is indepen- 
dent of both y(a) and A"(a) for a < t. Thus, C UY (t - a) = 0. On the other 
hand, the derivative of the right-hand side of Eq. (4.117) yields: 


f G(t,o) C y (o - a) da = f ^^C y (o - a)da 

Jt 0 J 

+ G(t,t) C Y (t - ft) (4.119) 

However, the left-hand side of (4.1 19) after denoting Z(t ) for 0(/)Z(f). can be 
written as: 


_a 

bt 



G(t,a)C Y (o - a) da 


= y t f G(t,a) E { |T(o) - m y \ |T(a) - m y Y } da 
J f 0 

3 f r 

- I G(t,a ) E ) (Z(a) - m y + v(o)J (Z(a) - m y + v(a)J | da 
J t n 




G(t,o) CJo - a) do + r- G(t,a) L(ct > 


bt 


r 


bG(t,o) 

bt 


CJo - a) da + G(t,t) C 7 (t - a) + 


bG(t,a) 

bt 


L(a) 
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(4.120) 



Now, using y(/) = 0(0 2f(0 + v(0 = Z{t) + v(t) and the fact that C VY {t- a) 
= 0, following Eq. (4.1 19), we can obtain: 


C z (t - a) = E |Z(0 Z'(a)} = A(t) C x Y (t - a) 


= ^(0 



G(t,o) C y (o - a) Jo 


(4.121) 


Now, if we combine (4.117), (4.118), (4.119), (4.120), and (4.121), we 
obtain: 


f £<( t)G{t,o) - - C(r,t) i 4(0C(t,o)J C Y <o-a)do = 0,t 0 <a<t 


(4.122) 


Then, from the above: 


A(t)G(t,o ) - - G(t,t)AU)C(t,o) = 0,t 0 <o<t (4.1 23) 


Since 


= f Git, 


X(t) = / Git, o) Y(o) do 
'o 

for the optimal solution, combining this with (4.123) yields 


(4.124) 




' J, » 

* A 


f-G(t.a) Y(o)do + GU, 0 K(t) 


■r 

* A 


[4(0 G(f.o) - G(/,0 0(0 G(t,o) y(c)| do + G(/,0 7(0 
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which implies 

£ = A(t) XU) + G(t, t ) | Y(t) - C(.)j XU) 

- M(0 - GU. 0 CYO] -V(0 + AU, t) YU) 

Thus, 

F t (t) = \A[t) - G(U) C(t)\ = |^(/) - F 2 (t) C(t)) 

This part of the proof is done. 

It can be shown (left as an exercise) that: 

[-4(0 - F 2 U) C(/)J e(t) + m UU) - F 2 U) hu 


and 


where 


C XY (r - a) = C xz (t - or) 


e(t\t) = X(t) - X(t) 


We can also obtain Cyio - o) as: 

Cy(a- a) = ' K'(o)[ 

= F j|Z(o) + Wo)] |Z(o) + r(a)J' } 


= C y (o - a) + C v (o - a) 

= C y (a - a) + /.(a) 6(o - a) 


( 4 . 125 ) 


( 4 . 126 ) 


( 4 . 12 ?) 
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Thus, 


C XY (: -a ) - f G(*.®) C r (o - a)do - 
J 'o 

C xz (t - a) = f IGft. a) C z (o - a) do + G(t, a) R(a) 


from which 


C(t. t) l(t) * t \ [*(/) - *(r)l Y'(t ) } = <7^(01/) C V) 


If we lei 


P(t) = C^OIr) 


and since L '(/) exists (assumed to be positive definite), we can obtain: 


F,(0‘G(t,0 = n0C'‘t)L- l (0 


(4.128) 


The only thing needed in the proof is to solve for P(t). From Eq. (4.125). let 
us solve for e(/) or. equivalently, e(f|f). 

St 

Let (r, r) denote the transition matrix o.' (4.125), then 


e(0 • tyr,t a ) + 


f $(U) I- 


FJs) + v(s) + Z?(/(s)j ds 


Substituting 
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in the above, then, with the assumption that e(0) and U(s) and v(s) are 
uncorrelated, we obtain (after some manipulation): 


m = *<U 0 ) nt Q ) *'(r.r 0 ) + f $(r.s) (F,(s) Ms) f;(s ) 

Jt o 

+ BQB'\ $' (t.s)ds 
Thus, upon differentiation, we obtain: 

p= m f 2 u) C) m+miA'-c r 2 (f» 

+ F 2 (t)LF 2 (:) + BQB’ 

where we have used 

M - F 2 (t) c(r» $(u 0 ) 

in the above. 

Now, if we substitute F 2 (f) from (4.128), we shall obtain the result, ue., 

P = AP + PA' - PC' L l CP + BQB’ 
which completes the proof. 

Remark 7. Let the gain F 2 (r) be changed notationally to K(t) and the gain 
F,(/) to F{l). Then, Eq. (4.100) can be rewritten as: 

j? = mo) - m aoi £(o + m nr) 

- Mr) xu) + m ino - mo noi (4.129) 

4.10.2 Prediction 

The solution of the prediction problem is a simple extension of the filter- 
ing problem, and it is actually presented by Kalman in his initial paper. 



Suppose we wish to estimate Jf(/. ) based on the observation Y(t) given on the 
mterval 0<r <f for t, > t. The* solution Jf(/,1/) is given by: 

Jfifr, If) * ♦(/,. f) JRfl f). f, > f (4.130) 

where $(*,*) is die transition matrix corresponding to Eq. (4.100). 

The covariance matrix is found accordingly. Therefore, for prediction prob- 
lems, we must first obtain a filtered estimate of the state, up to the range of 
available data. 

Thus, T(X) should be set equal to aero for X > and £(f) serves as the 
initial condition in Eq. (4.90). 


4.KL3 Smoothing 

In smoothing 0 < f , < f, where it is desired to estimate X(t { ), given the 
observation over the interval 0 < t < f. Smoothing is fa' more complicated 
than either filtering or production. We shall not discuss the smoothing prob- 
lem here. The conclusion given by Eq.(4.ll8) does not hold for smoothing 
because for f, < t, we do not know that Jf(*) and U{') are un correlated, 
which was assumed in filtering and prediction. 


4.10.4 Discrete Kalman Recursive Estimation 

In Subsections 4.9.14.93, we have discussed the continuous model repre- 
senting the continuous random processes. We shall begin the discussion of 
discrete-time version of the problem since the discrete version must be utilized 
for computer implementation. There are a number of inherent advantages: for 
example, the discrete algorithms can be manipulated by hand and the step-by- 
step processing of information lends itself to a ample development. 

In what follows we shall discuss predicuon. filtering, and smoothing. 


4.10.5 One-Step Prediction 

Consider the discrete dynamic system: 


X(k + 1) = A*k) + BtAk) (4.131) 

Y(k) = CX(k) + **) 
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(4.132) 



The signal and noise have the following statistical moments: 


EU{k) - Er{k) = 0 

(4.133a) 

«**,) lf(k 2 ) = QMk 2 - *,) 

(4.133b) 


(4.133c) 


(4.133d) 


where A, B, C Q> and L ate * X n, q X it, p X p, and q X q matrices, 
respectively, which are in general a function of k. The quantity A(* 2 * is 
defined as follows: 


A (* 2 - *,) 


1, ifk t =k 2 
0, otherwise 


(4.134) 


Q and L are assumed to be positive definite. 

The initial state JT(0)_» assumed to be a random vector with a known a 
priori covariance matrix P( 0). 

We would like to find the estimate of the vector X(k + 1) denoted as 
%(k + 1), which is a linear function of K(0), K(l), . . . , Y(k) minimizing: 

E[X(k * 1 ) - X(k * 1 )' W * 1 ) - X(* + 0] ( 4 . 135 ) 


where W is any positive semi-definite matrix; for example W = / is a proper 
choice, and it can be shown that the optimal solution is independent of the 
choice W. 


The solution to this problem can be obtained by conjecturing that the 
estimator has the form: 

X(k + I) = F t (k) X(k) + F[k) Y(k) (4.136) 


164 



where the matrices F, and F will satisfy the relation: 

F t (k) = A- F[k)C' (4.137) 

#**)- jjftKMCTOC' + Ilr 1 (4.138) 

where P(k) is defined by: 

£|3f(*) - *(*)] IJT(Ar) - *(*)) ' (4.1 39) 


It can be shown that the matrix P(k ) satisfies the following equation: 

P(k + I) * (J- ftf) C) /t*)M - fU) C)' + BQB' + F{k) LFlk)' 

(4.140) 

if we rewrite equations (4.136) - (4.140), we obtain: 

X(k + I ) = | A - f{k) C I X(k) + Ffft) Y(k) (4.141) 

/W=^*)CMm*)C'+rr' (4.142) 


I\k + I) = \A - F{k) C ] f(*)M - F{k) C]' +BQB" +F\k)LF’(k) 

(4.143) 

We must provide the a priori conditions 3f(0) and F(0). The problem of 
predicting more than one step is a simple extension of the above. For exam- 
ple, %(k + j) for /> I , can be obtained as: 


X(k+f)=A hi X(k + 1) (4.144) 

and the associated covariance matrix is found accordingly. 


4.10.6 Discrete Filtering 

The filtering problem is the determination of the estimate of X(k) given 
the observations T(0), Y( 1 ),..., Y(k). Let us denote the filtered value of 
*(*) by X°(k). It can be shown that -J®(A) is given by: 

X°{k) = (J) -1 X(k + 1) 
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( 4 . 145 ) 



where X(k ♦ 1) is determined from (4.141) - (4.143). By utilizing these equa- 
tions we obtain: 

X °(ft) * 1 1- F{k) CJ AX°(k - I) + F[k) Y{k) 

F{k ) = P(k)C' \CP(k) C* + Lp' (4.146) 

P(k + 1) = A\1 - F{k) C] P(k) A 1 + BQB' 


which is the solution to the optimum filter. 

4.11 COMBINATION OF UNBIASED ESTIMATORS 

Suppose we are given two unbiased estimates £,(r) and ^(r) of the same 
state X (f). There are two cases to consider: either and X 2 are correlated 
or they are uncorrelated. We shall discuss both cases below. 


4.11.1 The Estimates are Uncorrelated 

X f and X 2 are said to be uncorrelated if 


£pr-f,l \x-x 2 )'*o 


The optimal estimate of X is obtained as follows: 


x = nr l l Jr, +/>-' J? 2 ) 
r = (/»,-' +/»-')-' 

where P f is defined for i * 1,2 by: 


£, = £(*- X f ){X- X,)’ 


(4.147) 


(4.148) 

(4.149) 


4.11.2 The Estimates are Correlated 

The solution for correlated estimators is given by: 

Jf = L t X, + L 2 X 2 (41 SO) 
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( 4 . 151 ) 


where 


-aw" 

L t-Vi *v »■»>"' 


Both proofs are simple and are left for the reader to verify. In the next 
chapter we shall apply the estimation theory developed here to two- 
dimensional signals and images. 
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EXERCISES 

4.1 Oven X % , X v . . . , X n as random variables such that: 

EiXj) = w and var(2f f .) = a 2 

Assume that X f - m and Xj- m are orthogonal for / ^ /. Let 


^ 1 
m ~ 

n 


.E*. 


/=i 


and 


/=i 

be estimates of m and a. 

(a) Determine whether or not m is unbiased. 

(b) Show that 

» n n 

* 2 l x i ' + EE - /«)(jir. - w) 

#=i /*, 7 

Hint: First prove that 



R 

n(m ~ m)*£ (X,- m) 


i= I 


(c) Determine whether or not a 2 is an unbiased estimate of a 2 . 
4.2 Let the random variables X t and X 2 be such that: 


£(JT 1 ) = £(.V 2 ) = m 
and 


var^) ■ varJA^) = o 2 
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with 


is-KJr, - m) (a - 2 - »i)l -o 

(a) If X * (Y, , X 2 ), then show that: 

C mX *(m 2 ,m 2 ) 

(b) Show that £(**) « £(* 2 ) = o 2 + m 2 and E(X t ,X 2 f*m 2 . 

(c) Obtain the covariance of X. 

(d) Obtain the m.s.e. of m from the data (x % , * 2 ). 

(e) Determine the conditions such that your m.$.e. in part (d) is un- 
biased. 

4*3 Let R(t) be the autocorrelation function of a process Suppose it is 
desired to obtain the linear m.s.e. of X(t + X) for some X >0 in terms of 
X(t ), and *"(f) i c., X(t + X) = a t X(t) + a 2 3T(f) + a 3 3f"(r)^Use 

the orthogonality principle to determine the optimum estimate of X(t + 
X) and determine the m.s.e. of the error X(t + X) - X(t + X). 

4.4 The zero mean random variable X is to be estimated in the linear mean 
square sense by the random variables Y v Y 2 , . . . , Y each of mean zero. 
Let £be such an estimate. Utilizing the orthogonality principle: 

(a) Show that £(e 2 ) = £((* - X) 2 ] = £(( X - X)X\ . 

(b) Obtain the optimal solution X. 

(c) If e m is the error corresponding to the optimal solution, i.e., e m = 
X - then verify whether or not 
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4.5 Let K(/) * $(/) + n(f) be given such that En(t ) * Es(t) = 0. 

(a) Use the orthogonality principle to estimate i(f) = (d/dt)s(t) f and show 
that the optimal estimate (unrealizable) T can be obtained from: 



Hint: S iy (w) = /w5 r (w), 

(c) Given Rji r) = exp (- 1 rl ) or 5^(co) = 2/(1 + to 2 ) and R n (r) * 2 6(r), 
obtain an optimum estimate S with the constraint of realizability 
imposed. 

(d) In part (c) design an optimum realizable predictor s(r + 1 ). 

(e) Design an optimum realizable filter for 

W(0= I s(r - X) cfX 
•'o 

The answers in parts (c)-(e) can be left in the frequency domain. 

4.6 A model is generated when white noise with the variance of unity (unity 
spectral density) is passed through a system with the transfer function 
l/[s(s +1)]. The model is also contaminated with white noise rt(t) with 
S n (cj) - 1 . Assume that E(s(t) n(f )) = 0. Find the transfer function H(s) 
of optimum estimate that will yield the best m.s.e. Also obtain the trans- 
fer function of the best m.s.e. of the derivative. 

4.7 Consider the RC network given by: 

ft 


170 


where the unit impulse response h(t) is given by: 


h(i) = -i-exp (-f/ot), with cl- RC 

Let the input to the filter be y(t) giv^ by: 

) + «<*) 


where s(f) is given by : 


$(/) a A COS (w 0 * + 0) volts, CO 0 


2 * 

r 


with the random variable 0 distributed uniformly over [0,2 tt] . The ampli- 
tude A is constant, and n{t) is a zero mean white noise with its power 
spectrum given by 


S n (o>) = N (watts/hz) 


(a) Calculate the input power spectrum. 

(b) Calculate the input power. 

(c) Calculate the output power due to the signal only 

(d) Calculate the output power due to the noise only. 

(e) If the signal-to-noise ratio (SNR) is given by: 

SNR - Output power due to signal 
Output power due to noise 

then obtain the maximum SNR. 

4.8 Let y(/) be an observation given by: 


Y(t)=s(t) + n(t) 
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where 


S a (ot) = , S n (w) = 4, and S n4 (w) = 0 


(a) Find the optimum predictor s(f + X) by finding the corresponding 
optimum impulse response without the constraint of physical realiz- 
ability. 

(b) Repeat part (a) with the constraint of realizability imposed. 

Hint: you may need to use 


1 + k 2 s 4 = (1 + \/2ks + fo 2 )(l - \/2ks + ks 2 ) 
You may leave your answers in the frequency domain. 


4.9 Let be a scalar random variable and X x and X 2 be two correlated 
unbiased estimates of X with associated variances (covariances) o 2 and a 2 , 
respectively. Let p denote: 


p = £[(*-*,)(*- * 2 )] 


and o denote the variance (covariance) associated with X , where X = 
olX x + 

(a) Show that a + 0 = 1 and derive an expression for a 2 in terms of a 2 , 
o 2 , p, a, and 0. 

(h) Obtain the optimal estimate i.e., determine a and 0 such that A* is 
optimal. 


4.10 Let a system be described via the model: 


V 


*,(/)' 




= 


+ 


*t m 


x t (f) 
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and 





V. 


i 


i 

y” *- 

( 

+ 

. V 2. 


where 

E\Ulf 1 = £(vt>'] = /5(r- t) 

Note that V and v are vectors. Write the appropriate equations for the 
optimal estimate. What is the error covariance matrix? 

4.1 1 Suppose it is desired to estimate a constant which is unknown; a system 
model may be given by: 

X • 0, Y = A' + v 

where 

E\v{t)v(T)\ = Q5(r - T) 

Obtain a closed form optimal solution. 

4.12 Repeat problem 4.1 1 if the state model is changed to: 

X*~X + U(t) 

and Q * 1/4, E[U(t ) U(t)\ =2 5(r - t), and E\Uv) = 0. 

4.13 A scalar discrete random process X(k) is given by: 

Xik + 1 ) = 0.5 *(*) + t/(*) 

Y(k) = X(’<) + v(k) 
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where U{k) and v(k) are white noise terms such that: 


£|r<*)) = £(i/ 2 (*)) * l 

Also assume that: 

EX( 0) = 0 
f(A(0)] 2 = I 

It is obvious that the Kalman estimator (one-step predictor) is given by . 


X(k + I) = (0.5 - F(*)( X(k) + F(k) Y{k ) 
W P(k) + 1 

l\k + 1) = (0.5 - F(k )] 2 .V ) + 1 + F 2 (k) 
P( 0) = 1, j?(0) = 0 


Suppose K(2) is not leecived, then perform the following: 

(a) Piovide the correction (or the adjustment) necessary in the above 
Kalman estimator to account for Y( 2) r.ot being received. 

(b) Calculate the loss in terms of estimation error variance associated 

A 

with AX3) in part (a)* The error variance is denoted by £(3) and is 
giver, by: 


£(3) = /\3)- P{1) 


where £(3) is the covariance with the observation T(2) missing, 

(c) Calculate the steady-state covariance 


lim 
k -> °° 


P(k) 
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CHAPTER 5 

MODELING OF TWO-DIMENSIONAL 
SIGNALS WITH APPLICATION TO 
IMAGE RESTORATION 


5.1 introduction 

Tuis chapter considers large classes of those two-dimensional images that 
are best characterized by statistical procedures, such as specifying their first 
two moments (mean and correlarion) which represent the brightness level of 
the signal (image). Although, in theory, classical image enhancement does not 
seem to be veiy difficult, th: implementation of every classical technique has 
a drawback because it is nor recursive and is seriously hampered by the pre- 
sence of noise. Attempts to construct two-dimensional recursive filters usually 
fail because of numerical stability problems. 

Wnen the image has been contaminated by random noise and the only 
informatiiHi concerning the image is of a * atist'jal nature, image enhancer 'nt 
is a problem of statistical estimation and filtering. Nahi and Asset! [If! and 
Assefi 1 1 2| and 1 1 3 1 deve*oped a recursive procedure to estimate the con- 
taminated image, where the st oistical characterization of the image (two- 
dimensional signal) is assumed to be spatially stationary. Next, the image is 
scanned horizontally, and the two dimensional correlation functions are con- 
vened inti) one-d; nensional correlation functions via an optical scanner with 
: ts ovitput designated as *(/). The autocorrelation function «»f st/I is nonsia- 
♦lonaiy and nonsepar hie [I4| becai • of the scantier** periodic movement. 
Thus, no finite dimensional time-invariant dynamic mole’ representing the 
statistics of s(f) exists |:4,l5|. 
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The nonstationarity can be remedied by generating another statistical pro- 
cess whose autocorrelation function is stationary and which approximates the 
autocorrelation function of 5(f). The results of this technique are satisfactory. 
Since we shall be dealing with the question of realization of autocorrelation 
functions and thus spectral factorization, a brief background of spectral fac- 
torization is given. 

Nahi and Franco 1 16] scanned the picture several lines at a time and 
derived a vector model which led to a sir W recursive estimator than those 
of 1 1 1 ] and [12]. However, it does not ta*e advantage of all the information 
available from the image. In other words, the estimation of a given set of lines 
does not depend on the data received from the previous lines. Liter. Powell 
and Silverman [17] viewed the problem in a different light and rederived Nahi 
and Franco’s results. 

Next, we shall utilize a better approximation to s(f) (scanner’s output) or 
its autocorrelation function developed by partitioning the image into a collec- 
tion of vertical strips and approximating j(f) by a series of stationary random 
processes, one associated with each strip. For each stationary approximation, a 
corresponding linear time-invariant dynamic model is constructed. A procedure 
for recursively enhancing a degraded image is developed in a manner similar to 
the case where the image has not been partitioned. The major difference is 
that rather than utilizing one dynamical model corresponding to one autocor- 
relation function, a chain of dynamic models corresponding to many auto- 
correlation functions is considered. Examples are constructed to show the 
effectiveness of the enhancement process. 


5.2 SPECTRAL FACTORIZATION 

The concept of spectral factorization has become increasingly more im- 
portant since Wier<*r'$ original work [18] on the subject. Basically, spectral 
factorization determines the equations that describe a linear system when the 
system is driven by white noise and the covariance of the output is known. 
Whenever the covariance function of a process is driven by white noise via a 
system of differential equations of first order, we refer to this system as a 
dynamical model. More specifically, •riven « covariance function R(i.r). where 
t<f % and r<Tj for some nxed -*»d ij . the factorization problem is to 
determine a realizable linear filter (differentia! equation model) that, when 
driven by white noise, yields /?(/. r) as its output covariance. 

It is wdl known. (I5| and |20|. that, in general, no such realization nia> 
exist. However, if its existence were guaranteed, the representation (in some 
sense) would be unique. In its most popular foi n, the spectral factorization 
would be confined tostationay situations Then the corresponding d> n a mical 
model und*'r consideration would be time invariant, and the white noise fore- 



log function must have started infinitely in the past. This dynamical model 
would be asymptotically stable. It is also desirable to deal with finite- 
dimensional dynamical models, implying that each linear model must possess a 
rational bilateral Laplace transform. We can simunarize the above discussion 
by the statement of Theorem 1, which we shall not prove, but which is 
proved in reference [7] . 

Theorem 1 

A necessary mid a sufficient condition that a station; ry process y(t) be 
representable as the output of an asymptotically stable, time -invariant, finite, 
dimensional linear model is that its spectral density f?(s) be a rational function 
of the form /f(s)ffl(-s), with 




Ms) 

P(s) 


(51) 


for some polynomial 


«-i 

p(*> = *" + 53 

»'=0 


with all roots in the left half part of the s-plane and 


n - I 

m = £ 

1=0 

with degree less than or equal to n - 1 and all roots in the left half of the 
s-plane. wheie a, and are the real coefficients. Thai is. His ) has all of its 
poles and zeros in the left half of the s-plane. 


5*2.1 Determination of the Output Covariance from a 
Linear Dynamical Model 

Consider the following dynamical model, given by: 


x =^(r)jr(/) + Bit) uit) 

yit) = CV)xit) (5.2) 
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where *(/) Is an n X 1 vector,* is in m X 1 vector, y is a scalar, A t B t and C 
are matrices of appropriate dimensions (not necessarily time-invariant), and 
n(f) is a zero-mean white noise vector, such that: 

£*(0«(t)*« 0- r) (53) 

where K is an m X m symmetrical matrix and prime denotes the transpose. 

It is desired to calculate the output covariance (an autocorrelation, since 
y(r) is of zero mean) £Xf)/(r), given by: 

EAOA r) = C(t) ExU)x\t) C \t) (5.4) 

Let the random variable x(f 0 ), where t Q is the initial time, be statistically 
independent of «(/). It is well known that the solution of x(r) is given by: 

x(t ) * f 0 ) x(t Q ) + f <P(t y t) B{t) u(t) dr (5.5) 

Jt o 

where 4>(f, r) is the state transition matrix; i.e., 

^^ = ^(0«Kf,r) (5.6) 

Wt, t) = / (5.7) 

Substituting x(r) from Eq. (5.5) into (5.4) and performing some mathematical 
operations, we obtain [20] : 

W)y(T) = C(f) *(r, t) P x (t) C*(t) l(f - r) + CU)P x U) r) C*(/) l(r - t) 

(5.8) 

PJt) = £*</) x’(r) (5.9) 

where !(/) denotes the unit step function. 
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From the dynamical model (Eq. 5.2), P x (t) can be shown to be the solu- 
tion of the differential equation [20] : 

P x =AP x ♦ P x A* + BKB* (5.10) 


where the covariance P (f Q ) must be given. 


522 Independence of Estimation Problem of a 
Particular Coordinate System 

In spectral realization, j<f), given by Eq. (5.2), is the signal without any 
noise contamination. Often, we receive a contaminated observation z(/), given 
by: 


z(/)=><f) + w(f) (5.11) 

where w(f) is additive noise, which is assumed to be uncorrelated with y(f). In 
[20] it is shown that the only information necessary for recursive estimation 
is the knowledge of Ey(t ) y{t + t) and Ez(t) z(t + r). That is, the solution ot 
recursive estimation in the mean-square sense is independent of the particular 
coordinate system for model z(*) and /(*) processes; hence, a inique solution 
associated with minimum mean-square estimation can be obtained where the 
models for the processes are not given in advance. All these models are related 
to one another by a linear transformation. For example, if 


x = Ax(t) + Bu{t) 


y = Or(f) + v(t) 


(512) 


and 


x* = A*x*(t) + B\\t) 

v=CV(r) + *r) (5.13) 


correspond to the same realization, then there exists a linear transformation 
T(t) such that: 


*’(f) = /Tf)x(r) 


(5.14) 
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and 


? = 1X0*0 (^15) 

where x and x* are the estimates corresponding to Eqs. (5.12) and (5.13), 
respectively. 

The covariance estimates can be obtained accordingly. 


5.3 RECURSIVE IMAGE ESTIMATION 

5*3.1 Procedure Outline 

The enhancement of images that are characterized only by statistical data 
where the picture contains additive noise is considered in this section. The 
random process representing the output scanner is characterized by the output 
of a dynamical model with white noise innut. The dynamical model describes 
the first-order vector Markov process. Ihe procedure of Kalman filtering is 
then utilized to recursively determine the minimum mean -square error 
estimate of the image. The result is also extended to obtain the smoothing of 
data. Two examples, one with very high SNR, are used to illustrate the effec- 
tiveness of the procedure. In what follows, the image is assumed to be a 
two-dimensional, stationary correlation function of zero mean. Thus, th<* auto- 
correlation function and the covariance become identical. The statistical in- 
formation about the image and the noise is assumed to be known and uncor- 
related, and the noise is additive. 


5.3.2 Derivation of Autocorrelation Function of 
Scanner Output 

Let us scan a picture horizontally using an optical scanner denoted by s(f). 
Let the horizontal position (a continuous variable) be denoted by z . where 
0< z <Z. and the vertical variable by an integer n = 1. 2. ■ • • ,.V representing 
the nth scanned line. The brightness function is defined by />(r,n). Let us 
assume, without a.iy loss of generality that b{z.n) is of zero mean. The 
random process b(z % n) is assumed to be wide-sense stationary, with the auto- 
correlation function defined by: 


!fMz 2 .» 2 ) Wij.nj ) i R(z 2 - z,.n 2 


n, ) = R(r n) 


(5.10) 


Assume that the scanner output sit) has a horizontal speed r = 1 and, 
without any loss of generality, that the vertical movement takes zero time. 
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Let us determine Es(t) s(t + r) in terms of R(z, n ) and Z The variables t and t 
can be equivalently expressed by: 


t */T + a, / = 0,1,2, ■ • • ,Af - I, 0 <a<T 

r * iT + 7. / = • ■ • ,-1,0,1, • • • 

,0«r + T«JV7\ 0<7<r 


(5.17) 


where T = Z is the time required to traverse one horizontal line. The scanner 
output can now be written as: 


*0) = b{a,j + 1), s(r + r) = 


! b(o + 7,i +/+ 1), 
if a + 7 < r 
Wo + 7- r,/ + ; + 2), 
if a + 7 > T 


(5.18) 


Now, utilizing Eqs. (5.16) and 1 5.17). we can obtain: 


Es(t) s(r + r) 


! /?(7, 0, 

if a + 7 < T 
R(o + 7 - T. /+/'+ 2), 
if o + 7 T 


(5.19) 


It is clear that Es(t) s(r + r) is a function of both o and 7, or. equivalently, 
of t and r: thus, it must be nonstationary. The nonstationariiy is due to the 
edge condition. A simple check shows that Es(t) s(t + r) is also periodic and a 
nonseparable function. It can be demonstrated that no ' nite-dimenrional 
linear realization of this nonseparable autocorrelation exists. 

We shall now seek to generate a random process denoted as (fir) such that 
it has a stationary autocorrelation function which approximates the auto- 
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correlation of the process s(t). To generate 4 (f), we proceed as follows. For a 
given /,?(/) is defined by: 


4(f)=s(/T+£) (5.20) 

where £ is assumed to be uniformly distributed over [Q>7 ] . We sh.ll now 
prove the following theorem. 


Theorem 2 

The random process 4 (f) defined by Eq. (5.20) is stationary. 


It is easy to verify that: 


£4(0 = 0 


by the construction of q{t ). 

Next* we must prove that Eq(t) q{t + r) is a function 1 / r (or, equiva- 
lently, 7 ) only. To accomplish this end, we calculate the correlation function 
of the process q(t): 


Eq(t) q(t'+ r) = £^£ |s(/T+ £) s(jT + $ + iT + 7)] 


-rf 

J 0 


E s [s{jT + £) s(jT + '* + iT + 7 )) da 


(5.21) 


This equation is obtained by utilizing Eq. (5.24; and r /T + 7 , which is 
given by Eq. <5.17) and the fact that £ is uniformly distributed over |0. T\. 
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The subscripts s and £ in (5.21) denote the expectation with respect to s and 
£, respectively. From Eqs. (5.19) and (5.21), one obtains: 


r[r 


Eq{t) q(t + r) = j I I R(y, i ) </£ 


+ J R(T- 7. ‘ + 1)^1 


R(y, i ) + lR(T - y,i + 1) = *t) 


(5.22) 


where Eq(t) q(t + r) is defined as r(r)> which is a function of 7 (or 7 ) only. 

It is interesting to note that the correlation function of <j(f), namely, ifr), 
can also be obtained by averaging the autocorrelation function of s(f) over one 
period. However, it is important to mention that such averaging over the 
subintervals of a period may not give rise to a stationary autocorrelation 
function, and, furthermore, may not yield an autocorrelation function at all. 

As an example, consider a scalar random process characterized by a scalar 
differentia] equation: 


x = -x + 14 
yU) = cos (;) *</) 

where the initial state x( 0 ) = 1/2 and 

EuU) = 0 

Eu{t x ) w(/ 2 ) = &(/ 2 - ( t ) 
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Then, the autocorrelation of x(r) can be obtained as follows: 


Ex{t) xfi + t) = jexp (- \t\) 


Thus, Ey(t) y(t + r) is given by: 

Ey(t) y(t + r) a ^-cos (0 cos ( t + r) exp (- I rl ) 


which is clearly nonstationary, sin w e the correlation function of >»(f) depends 
on both t and f + r and is periodic (of periodicity 2 tt). However, if we 
averaged this autocorrelation over [0,ir/4] , the resulting average would depend 
on both t and t + r. 

The randomization of £ over the period T has the intuitive appeal that all 
points of the picture are weighted equally. 

The follov in« salient properties of Kr) will be used in what follows: 


*iT)=/*( 0,/) (5.23) 

Since R(z , n) is an autocorrelation function, 

K(O.n) n) (5.24) 

Thus, from ( >.22) and (5.23), 

-■ Lk - < 1. for all i, 7 (5-25) 


The above properties indicate that, in general, the correlation function fir) has 
a periodic nature. 


Example 1 

Consider a square picture subdivided into a 32 X 32 grid. Let T ® 1 second 
and r= 1. The signal is a 12 X 12 square starting at the 13th row and 13th 
column. Let m and n represent specific rows and columns, respectively. The 
above signal is represented by the brightness level b(m,n) -6.1 where the 
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signal exists and - 1 otherwise, resulting in a zero mean sample function. As a 
first approximation, let us choose: 

/?(*, /) = a exp (-M h 1 2 1 - M, I i • ) 


where a, and t* v are to be determined. Computation of he sample power 
results in a= R(0,0) ^6.1. The correlation between two adjacent grid points 
is calculated as 5.33, which is a value for R( 1/32,0) or R( 0,1). Hence, 

/*(*,/) = 6.1 exp {-4.35lz! - 0.1361/1) 

The correlation function is obtained by substituting the above into Lv A . (5.22), 
and the plot is shown in Figure 5-1. 



Fig. 5-1. Plot of r(r) and r a (r) (Dashed Curve) as a Function of r 


5.3.3 Dynamical Modeling of Image Statistics 

In this section, we wish to deriv. a differential equation model whose 
solution has . > autocorrelation function approximating Hr) given by 
Eq. (5.22). Sin** we subsequently intend to utilke a Kalman estimator, we 
seek a dynamical model of the form: 

x(f) = Ax(t) + Bu(t) 

y(t) = Cxit) (5.26) 

where x(t) is an //-dimensional vector, //(/) is a white noise vector, and v(f) is 
the scalar signal whose autocorrelation function is r(r). 
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The procedure followed is to represent an approximation to r(r), denoted 
by r tf (r), as a sum of terms such that each term can be easily modeled, since, 
in general, Hr) may not have a rational bilateral transform. The properties of 
r(r) may be utilized to decompose r(r) into the product of two functions h(r) 
and r(T)/h(r ): 




(5.27) 


where H(t) is chosen to satisfy: 


/?(*/') = R(0* 0* for all i 


(5.28) 


Since in many practical cases the two-dimensional correlation function 
R(zJ) is a monotonically decreasing function of i, a natural candidate for 
h(r) is, in those instances, a combination of negative exponentials; i.e.. 


/ 

MT)=^/ < exp(-X < .Tl) (5.29) 

f= 1 


The function p(r) is then chosen to be a periodic function approximating 
r(T)/h(T). The approximate correlation function is: 


r a (r) = h(r ) p(t) 


(5.30) 


Utilizing Eqs. (5.23) and (5.28), it can oe seen that the function r(r)//?(r) is 
unity at iT and less than unity for all other r; furthermore, from (5.22) and 

(5.29) it is an even function. Hence, p(r) is chosen to be an even function 
with period T . Thus, a natural candidate for this function is: 


27r/ 

p(t) = l^ a i cos y ' 
/= 0 


(5.31) 
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Consequently, an element of the function rjr) has the form: 


if a ; exp (-\lrl) cos^r 


• 532) 


and there are (/+!)/ such elements. 

A differential equation model with white noise input can be simply con- 
structed [8) to model each of these terms. Each will be a second-order system 
except for those corresponding to ; s 0; i.e.. 


ex P (’V rl ) 


which will te of first order. If the white noise forcing functions (one being 
necessaiy for each /,/ pair) are chosen to be mutually independent, tne collec- 
tion of all these differential equations defines the parameters A, B, C and 
represents the desired model for r fl (r). 

In the course of selecting the approximate function r fl (r), we must choose 
the coefficients properly, such that r fl (r) is a correlation function. We shall 
either guarantee ♦hat r fl (*) is a positive definite function or, equivalently, that 
the spectral denJty of r fl (r) is positive (9] . 

Example 2 

Using Example 1, let us derive a dynamic me del for rfr). Assume ’hat the 
desired model has the form given by Et,. (5.26), and further that : 


Eu(t) u{t + r)' = A5(r) 


(533) 


where 5(r) is the Dirac dtUa function, the prime denotes the transpose, A is a 
positive definite matrix, an I 


Fv{t)y(t + r) = r fl (r) (534) 

Because « ’he exponential nature of R(zJ)< we choose: 

/f(v) = A(0, 0) exp (-0.1361 7 I ) (535) 


187 



and 


j 

P(t) * 4^*1 cos 271/7 (S.36) 

/= 0 


In this example, we use the notation n v ir»> lead jf 0.136. 

The modeling procedure can be broken down as follows. The first term 
r a (r), namely. 


a 0 expi'/H. I) 


has the bilateral transform: 


2/iff. 


; = /? , «S) 

(S + u) (S - U) 1 


(5.3?) 


Ti e function /^(s) can now be factored into two functions. //j(s) and 
//j (-s), where 


R 


i 


(s + ju, > <s - p v T~ 


and 


//,<s) = 


i +f7„ 


Utilizing the method of this section, a ynaniic realization of // (s) is 
obtained as: 


x - 


*V v o ( ' ) + 


V,(/) = ,V,(7) 


(5.3S) 
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The second tevnt of r # (:), namely. 


a % exp (-|* r lrl) cos 2*r 


h > * ie following trilateral transform: 




Ks ♦ nf + (2*)* J |(-* ♦ #i r ) J ♦ (2»)*l 


Tlie fraction R 2 (s) can be factored out into two fractions. If 2 (s) and l» 2 (-s): 


R 2 U) = 


V2»,|i r |i ♦ V(2*) z + |t*J 
(* + #«/ +(2*)* 


(-1 ♦ I*,) 2 ♦ (2») 1 


where ff^i) is given by: 


» 2 <J) = 


V2f,J«/'|l + V^ar) 2 +M*I 
(I ♦ nj * (2»>* 


The corresponding dynamic realization of // 2 (s) is given as: 




n 


/ 


o 


v< 2 >(/> = C< 2 >x">(r) 
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where the superscript denotes the model corresponding to the appropriate 
term. The coefficients H* 2 \ and C* 2> are given as: 


^2> = 


0 I 

-2» ¥ 


***> = 


y/^ir 

>/(2»)* +M* - 2m*1 

C (I) =|1 OJ 


In general, the (A" + 1) term of r tf (r) is a k exp (-/Hrl) cos 2 nkr which 
has the bilateral transform R k * ,(*), given by: 


'*+i 


(i) = 


+ (2*jt) 2 + ji 2 | 


l(i + nf * (2kr ) 2 1 |(-s ♦ *i„) 2 + (2*jr) 2 ] 


(5-39) 


As before, the function ^ k+ ,(*) can be factored into two functions. /f ft + | (s) 
and 

VZUftPjj + y/( k*) 2 + P*1 ^ V^/Tl-s + + #i 2 J 

~ X - - — — 

(s + /i^) 2 + i2ir*) 2 (-J + ftp) 2 * (2 kit) 2 


where 




V2a k li v \s + y/(2 kn) 2 + /u 2 | 
(s + u) 2 + (2A-ff) 2 


and the corresponding dyn .icaJ model is: 

j(* + i) _ V* + ,) (/) + B {k * l) u {k * l) (t) 

y (ktt \t) = C (k '>*<*♦' >(/) (5.40) 
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where 


y|<* +, > 


0 I 

-(2*w)**ji* -2 #i r 


0<* + '> 


V(2*») 2 + 


(5-41) 


(5.42) 


C <* +,) =|1 OJ 


(5.43) 


It can be seen that the first term of r fl (r) is modeled by Eq, (5.38), which 
is a first-order system, and the subsequent terms by (5.39). which is the 
second -order system. Thus, to model the (J + 1) terms of r fl (r), we need a (2 / 
+ I)-order system. For example, suppose the function r^(r) has (7+ I) terms; 
then we can incorporate the first- and second-order systems into a new sys- 
tem. whose parameters A , B y and C are obtained as follows: 


o 

0 

vsv. 

0 


0 

0 


0 

0 

0 


L 0 

C* |1 10 10 


0 

I 

-*V 

0 


. 0 


0 2*^1 


(5.44) 


0 I 

1 ( 2^4 1 4 * 1^1 - 2 ^ 

0 

0 

0 

0 

s/la/nJs/ilnJ) 2 ZSj 


(5.45) 


I 0] 


<M h) 


Iftl 



Example 3 

If in Example 2 only three terms of r a (r) are retained, i.e„ J- 2, the 

resultant rj (r) can be written as: 

0 


2 

r fl (r) = 6.1 exp -0.136(7)57 cos 2irr 
/=o 


If we use the Fourier series for p(r), then a Q . a f , and a 2 wifi be given as: 


a 0 = 0.333; a % = 0.405; <* 2 = 0.101 


A plot of r a (r) is shown in Figure 5-1. The correlation term 


6.1 a 0 exp (-0.1361 rl) 


is modeled by: 


x x = -0.136 *,</) + 0.732 u x 


The second term in the correlation is modeled by: 


x 2 = x 3 + 0.82 u 2 

i 3 = -39.4 * 2 - 0.27 * 3 + 4.92 u 2 
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and the third term is modeled in a similar manner. The terms and w 3 

represent independent white-noise terms, each with zero mean and correlation 
function 5(r), where 6 is the Dirac delta function. The final results are: 


A = 


-0.136 

0 

0 

0 

0 


0 

0 


0 

1 


-39.4 -0.27 

0 0 


0 

0 

0 

0 

-157.7 


0 

0 

0 

1 

-0.27 




0.743 0 0 

0 0.820 0 

0 4.92 0 

0 0 0.410 

0 0 5.04 


C=[I 1 0 I 0] 


Often, two-dimensional stationary correlation functions can be approxi- 
mated by a combination of two-dimensional stationary correlation functions 
of the form: 


RixJ) - /?(0,0) exp t - pj/l) (5.47) 


Because of the importance of R(xJ) as given by Eq. (5.47), we shall discuss 
this special autocorrelation function below. 
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Calculating r(r) (given by Eq. 5.22), one obtains: 


r(r) exp(-it h \y\ - jut, I/I) 


+ ^-exp (-pj 7* - ?l * *»„•« ♦ H) 


where 


t = iT+7, 


Now, let us define a risk function^*) such that 


r i 


(Kt) - r (T)1 J dr 


and 


r a {T) * 2 a i exp tI ) COS ^ T 


1=0 


(5.48) 


(5.49) 


(5.50) 


We can select the coefficients a f such that the risk function dff(r) is mini- 
mized. For simplicity, we shall assume that F* I. It can be shown that J?(r) 
can be expressed by (16]: 




(5.51) 


To minimize -ifi(r), we must minimize: 


f 


IKt)- r (t)] 2 Jt 
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Thus, the minimization of &(r) becomes a simple problem, and the risk 
function can be obtained from [16] . The procedure is to set the derivatives of 
with respect to equal to zero, and the result can be obtained as 
follows: 


a = a l d 


where a is a matrix, whose elements are given by: 


**/= f e*P(-2n„ 

Jo 


(5.52) 


Irl) cos 2itkr cos 2 nir dr (5.53) 


and d is a column vector, whose elements are given by: 


-f 


d. = I r(r) exp (~/i I r I ) cos 2itkr dr 


(5.54) 


Furthermore, the following properties can easily be established: 


f MT)-r a (T)\* Jr* f tiT)- f %(T)dT (5.55) 

Jo Jo Jo 


f r\T)d T = . h ™ oo f r\(T) dr 


(5.56) 


5.3.4 Design of a One-Step Predictor 

Since we intend to utilize a digital computer for the estimation process, the 
model given by Eq. (5.26) is discretized, yielding: 


x{k + 1 ) — /tx(ft) + Bu(k) 


m = Cx(k) + Hk) 

195 


(5.57) 



In addition, the model given by Eq. (S.S7) contains the observation noise 
element r(*)- which_is assumed to be white, with mean zero and variance o 2 . 
The parameters A, B, and C are related to A, B, and C by: 


A = exp 




X exp 


exp exp (-/Is) BKB' 

M's) exp ^A'j^ds 


C = C 


(5.58) 


where K and A' are covariances of w(f) and w(fc), respectively. The sampling 
interval utilized in the above discretization is chosen to be T/N. Thus, there 
will be N observations for each horizontal scan. Since there are N horizontal 
scan lines, the final discrete observation is on an N X N grid. 


Example 4 

Continuing Example 3, we obtain: 


A = 


0.996 

0 

0 

0 

0 


0 

0.983 

- 1.22 

0 

0 


0 

0.031 

0.97 

0 

0 


0 

0 

0 

0.926 

-4.77 


0 

0 

0 

0.03 

0.9131 
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"0.02 

0 

0 

0 


0 

0.02 

0.12 

0 

BKB' = 

0 

0.12 

0.60 

0 


0 

0 

0.01 

0.07 


_0 

0 

0.07 

0.49 


C = C=(1 10 10) 

Utilizing the model given by Eq. (5.57) with parameters given by Eq. (5.58). a 
(one-step predictor) recursive estimator may be designed (see Chapter 4). The 
equations are given for the sake of completeness. 

x(k + 1) = \A- F{k)C} x(k) + F[k).v(k) 

P(k + 1) = [J - F[k)C i P(k) \A - F{k)C\' + BKB' + F[k)F'{k) a 2 
F(k) = AP(k)C ' I CP(k) C' + o 2 r* (5.59) 


The (one-step predicted) estimate of the image is, therefore, 

cs?(k)±m 


that is, y{k) is the best estimate of v(A'), obtained recursively in real time, 
where v(*) is the observation associated with the grid point immediately 
ahead of the scanner position. 


Example 5 

The signal y(k) is generated by using the image described m (he preceding 
example and adding white noise with variance a 2 . Let us define a measure of 
signahto-noise ratio by: 


. peak-to-peak variation of signaj 
o 
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The peak-to-peak uriation of the image is 7.1. Two values of p are considered 
here, namely, 7.1/3 and 7.1/10; the corresponding values of >>(A) and their 
one-step predicted values .y(&) are shown in Figures 5-2a and 5-2b and 5-3a 
and 5-3b, respectively. 


5.3.5 Implementation of Required Interpolation 

It is clear that image enhancement, from the point of view of scanner 
output, represents an interpolation problem; i.e., it is desired to determine the 
best estimate of >>(&), 0< k </V, given the observation ^(0).,y(l ),***, j>(/V)* 
In general, the interpolation problem is far more complicated [10] than stan- 
dard Kalman filtering. However, since for the image enhancement considered 
here the length of the data is fixed (jV) and, furthermore, the observation is 
usually available for additional repeated processing, it is possible to obtain two 
one-step predicted values of >>(&), denoted by and one by running 
the scanner in one direction starting, for example, at the top left comer of 
the picture and the other by running the scanner in the reverse direction 
starting at bottom right comer. Associated with these estimates are estimation 
error variances denoted by a 2 (*) = CP(k)C f and <T*( k ) = CP(k)C* < respec- 
tively. The two estimates must be combined to yield the optimal interpolated 
(smoothed) value y ( k ). Thus, a brief discussion of combining two estimators 
is warranted. 

Suppose we are given two state estimates, j?(f) and 7(/), of the same state 
variable *(/). There are two cases to consider: either *(/) and J(r) are corre- 
lated or they are uncorrelated. We shall combine only the case in which both 
are uncorrelated; i.e., 


E\x- x) [x- 7]' = 0 


(5.60) 


In this case the optimal estimate of x, denotec oy x*(f), is given by: 


*p*(P~ l x + ?~'y) 

(5.61) 

p* = (P' +P 1 )-' 

(5.62) 


where P and P are the error covariances of x and x\ respectively. Thus, 
applying Eqs. (5.60), (5.61), and (5.62) to obtain y(k) = C? and 7 = G? 
yields: 


/(*) 


aHk) 

a 2 (k) + 7f 2 (k) 


y{k) + 


? 2 (ft) 

d 2 (k) +V 2 (k) 


m 


(5.63) 
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Example 6 

Considering the preceding example, die covariance /tU it Eq. (>.>*>) nearly 
ri.tclies its steads -state value in ahoni two or three scan lines. Consequently. 
o|A) •=* 3lA) lor most of the picture, ami Eq. (S.o.t) reduces to. 

}' t A > =*= , [et A ) + T? A ) j 1 5 ,t»4 > 


Equation <5.641 was implemented, ami the results for p = 7.1 5 3 »d 7.1,10 
appear in Figures 50c ami 5 -5c. respectively. 


Careful observation of Eisttues 50b and 5-2c (or 5-.'b and 5-5c) reveals a 
consistent vertical correlation. which is attributed to the approximation of dr) 
by transposing the original picture and re-evaluating y*(A>. l ire two vaiues are 



Fig. 5-2. Observation and Estimates for >> = 7/3 


OMfflNAI. Wfl« IB 

of poqb odautt 



ORIGINAL PAGE B Fig. 5-3. Observation and Estimates for p « 7/10 

OP POOR Qtflfff 

averaged and arc represented in Hemes 5-2d and 5od fur -correspondin' 
values of p, In what follows the approximation is further improved. 


5 A PARTIAL RANDOMIZATION 


: over tlte period 7 furs the intuitive appeal that all 
weighted equally. While die results concerning tins 
hi a certain subclass of nonstaftmtary correlation 
u gratifying, it may lead to some d i or t coinings, l or 
t edge of a scanned line and the extreme left edge 
weighted as ;.wo adjacent points of a line. In order 
f our approximation, we shall discuss the idea of 
eh assumes n is randomly distributed over siihinicf 


vals of (O.F] . Intuitively, it can be seen that the more the number of subdivi- 
sions, the closer we approximate the correlation function of the scanner out- 
put. Thus, we shall subdivide the image in the manner given below. 

Let us subdivide | 0 , T] into Af parts such that: 

0 = T o < T i < r 2 < • • • < T m = T (5.65) 


Let A be defined as: 

\ = V Vr for7?= 12 M (5 66) 

Now for given t = jT + o, where o € [7^ r 71, in a manner to that before, 
let q r (t) be a random variable such that: 

qjt) = s(/T + £) (5.67) 

where £ is assumed to be uniformly distributed over ( 7 * n-l . T J for t? = 
1,2 , and q^(t) is not defined elsewhere. Now we shall prove the fol- 

lowing theorem. 

Theorem 3 

The random process </ (/) defined by Eq. (5.67) is stationary. 

Proof 

It is easy io verify that. 


Hq n (t) - 0 


by construction of q U). 

Next, we shall prove that Eq^(t)q^(t + r) is a function of r (or, equiv- 
alently, 7 ) only. Eq^( t ) q^t + r) can be calculated as follows: 

Eq^t) qjt + r) * t\E g ^T + £) siiT + ;T * £ + 7 ) (5.68) 

where (see 5.17): 


r = /T + 7, / = 0,il .±2 0 < / + r < AT (5.69) 
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and £ » aaifomly distributed over |r t . 7* | and I tI < For I <q 
<M, it is dear dot | *y < F. Utiiiag (5.14), we obtain: 


+ [ n ^T*0)s(,T*/r*a*y)<h 

^ Jr m 

f T T - T 

■ j - / " *(%»)<*>« " a — *<* 0***0 

, J V| 


However, for { € IF,,.,, r M ) = |r vj . T) . f ♦ 7 may no longer be less :han 
T. Utilizing Eq. (5.19) once more, we get: 

blJt'Hjt + r) = I E g s(/T + o)*ijT * iT + o + j) do 


-/■/ r / r 


%— VrO* K(7- 7’./+ n 

“if M 


#*(7- r,i+ i)Jo 


(5.71) 


where 1 71 < Ay, wh«ch concludes the proof. 

Let S_ be defined as follows: 

S n ± {f:/€(/T./T+ A |l ),/*0,l AT- 1} (5.72) 

where A^ is defined by Eq. (5.56). Hence, the entire picture consists of the 
collection of partitions S 2 S^. as shown in Figure 5-4. 

Let 0(t) be the observation given by: 


«(/) = s(g) + »'(f), t € S t 
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(5.73) 



Rg. 5*4* Partitioned Image 


where r(f) is the white noise of zero mean and variance o 2 . Now we can state 
a very important result via a theorem. 

Theorem 4 

The second-order statistical information of s(f) and 0(f) for f is suffi- 
cient for obtaining the best linear mean square estimate of s(t) denoted as 
s (f), given the observation 0(f), f G . The optimal solution is unique and 
independent of the particular generating model of signal process s(f). 

Proof 

Let L(o(t\ f) be the operator defined by: 


L(o(rXf)0(r) = 



a(r)0(r) Jt 


rT+T t 

+ I O( T )0( r )^ r + 


(5.74) 


+ 



a(r)0(r) dr 
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where a(r) is a scalar function. We are interested in minimizing: 

*W0- Stores, (5.75) 

where s(/) is restricted to a linear function of the observation 0(r), r < r with 
both r and / belonging to Consequently, s(t) has the form given by 
Eq. (5.74). 

It is desired to find that a(r), denoted by a°(r), which will minimize 
(5.75). Using the ideas of calculus of variations [8], let a°°(r) be any arbi- 
trary function of r and e be an arbitrary small scalar. Letting 


o (r) = or°(r) + ea 00 (r) 


and substituting this in Eq. (5.75) yields: 


E (j (r) - £(cr°(r) + «* #0 (tX 0 0(r) J 2 (5.76) 


where the expectation is over t and r. 

If a°(r) yields the minimum value for Eq. (5.74), then the coefficients of 
the term in e in the expansion of Eq. (5.76) must be zero, since e can be 
chosen small and with arbitrary si#i. It follows that: 


EL(a 00 (r), r) |0(r)(s(r) - s(/))J = 0 


Or. in the expanded form, 


E 



a oo (r)0(r)|i(r) - ?(/)) Jt + ... + 



a oo (T)0(T)|r(f) - ?(/)J <'r 


= 0 
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Since the above equation must be satisfied for any a 00 (r)» we must neces- 
sarily have: 


Zf[s(/) - s(r)I 0(r) = 0, for 0 « r < 7\ 
£|s(f) - ?(/)) 0 (t) = 0, for T < r < T + T % 


£*|s(/) - s(f)) 0(r) = 0, for /T < rt 

which is the orthogonality principle, that is. 

£[j(/) - ?(/)] 0(r) - 0. for r, t € 8 X and r < / (5.77) 

The solution of Eq. (5.20) yields the optimal solution 9[r). Equivalently. 
Eq. (5.77) may be written as: 

Es U) 0(r) = tl{t) 0(T) 

where $(r) is given by (5.74). Hence, we have: 


/fs</) 0(r) 


= J a°(r) E0(t) dir) <lt + J 


r+r 


i E0(t) dir) < it + / oP(r) £'0(r) 0(/> 
r r 


+ / a°(r)*f0(r)0</) dt 

J/T 


which implies that the optimal solution depends on the second moment 
statistics of s(f) and 0(f) over S f only. 


Example 7 

Consider a square picture subdivided into a M X M grid. I et T = I second 
and v - I . The signal is a 20 X 20 square starting from the thirteenth row and 
the first column. Let m and n represent specific rows and columns, respec- 
tively. The above signal is represented by the brightness level />(w. //) = l.5(>. 
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where a signal exists, and - 1 otherwise, resulting in a zero mean sample 
function. As a first approximation, let us choose: 


R(z, i) « a exp (-juj z\ - jutj i \ ) 


where a, and p v are to be determined. Computation of the sample power 
gives rise to ot - R(OJQ) *= 1.56. The correlation between two adjacent grid 
points is calculated as 1.394, which is the value for R{ 1/32,0) or /?(0,1). 
Hence, 


K(;:,i)= 1.56 exp (-3.44I z\ - 0.1071 r| ) 


Example 8 

Let us partition the above picture into three parts S , S 2 , and S 3 , where 
is given by Eq v^.72). We subdivide (0,1 ] as follows: 


o-r 0 <r,<r 2 <i- 3 -i 


with 


A 


t 



and A 2 


10 

32 


Then, Eq % (t) q { (t + r) for ' and / + r 6 S | can be calculated by utilizing Eq. 
(5.70) and is given by: 

£</,(/) q } (t + r) * 1,56 exp (-3.441 y\ - 0.1o7| i \ ) 

Similarly, Eq 2 ir) q 2 (t + t) for t, t + r 6 S 2 is given by: 

Eq 2 U) q 2 (t + t) = 1 .56 exp ( 3.441 >1 - 0.1071 r| ) 
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Eq^(0 4 3 (t + r) r or t, t + r G S 3 can be calculated from (5.71) and is given 
by: 


1 * 3 - y 

Eq % (t)q^t + r) = 1.56 exp (-3.44| 7 | - 0.107|/|) 


+ ex P ("3.44| 7 - 1 | - 0.107 1 1 + 1 1) 


5.4.1 Dynamic Modeling of Image Statistics 

Now, for *ny 1 < r? Af, we wish to derive a differential equation model 
whose solution has an autocorrelation function approximating Eq^(?) q^(t + 
r). We subsequently intend to utilize a Kalman estimator for each 77 , whenever 
the signal ^ (f) is contaminated by additive white noise. But, from 
Theorem 3, the linear minimum mean square estimate ? (/) is independent of 
the particular dynamic model generating the signal process q (0* Hence, it is 
sufficient to devise any stationary correlation function which matches the first 
two moments of q^t) for *€5^. 

Again, without any loss of generality, we let 77 = 1 , since the analysis 
would be similar for 17 > 1 . Let the dynamic model 


x = A x x(t) + B^uit) 
AO = C x x(0 


(5.78) 


be such -hat its output correlation function denoted as 0 ,(r) satisfies: 


0,(r) = Eq x U) q x (r + r), for U t + r G S x (5.79) 


where .tj(/) is an n -dimensional vector, m(/) is a white noise vector, and y{0 is 
the scalar signal whose autocorrelation function is (r). The procedure fol- 
lowed is to present an approximation to 0 ,(r), denoted as 0 1 <f (r), as a sum of 
terms such that each temi can easily be modeled. The procedure has been 
discussed; however, we shall repeat it for the sake of completeness. 
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Let us decompose (r) into the product of two functions (r) and (r)/ 
| t (r), where f t (r) is chosen to satiny ?,(iT) = R(0,i) for all /, and ^(r) is 
taken to be a combination of non-negative exponentials, i.e.. 





/= i 


The function p x (r) is chosen to be a periodic function approximating 
0 |(t)/$j(t). The approximate correlation function is then, 


^ tf < r > s M r W r > < 5 - 80 > 

A natural candidate for p x (f) is to choose p x (t) as; 

J 

p,(r) = J) «. cos — t (5.81) 

/=o 


Hence, an element of the correlation function p f (r) has the form: 


lp f exp(-X.lil) cos 


2tt/ 

T 


T 


and there are (J + 1) / such elements. A differential equation model with 
white noise input can simply be constructed to model each of these terms. 
Each will be a second-order system except those corresponding to j = 0, which 
will be of the first order. If the white noise terms are assumed to be mutually 
independent, the collection of all these differential equations defines 
and Cj and represents the desired model for 


Example 9 

In Example 8, due to the exponential nature of R(z, /), we choose: 


^(r) = /?( 0,0) exp (-0.1071 r|) 
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Only three terms in (5.81) are retained; that is, J = 2. The resultant 4> la (r) is: 


2 

0 |o (r) = 1 .56 exp (-0.1071 t | ) ^ a f cos 2 u/r 

/=o 


where a 0 = 0.396, = 0.445, and a 2 - 0.0131. The autocorrelation term: 

1.56a 0 exp (-0.1071 r I ) 

is modeled by JCj(r), where 

x % - 0.107Xj + 0.365 m j 

The second term in the correlation <t> ia (T) is modeled by: 

x 2 - x 3 + 0.368 m 2 
* 3 » -39.4x 2 - 0.214x 3 + 2.42w 2 


The third term is modeled in a similar manner. The w r w 2 , and represent 
independent white noise terms, each with zero mean and correlation function 
6(r), where 6 is the Dirac delta function. The final results are: 


A . = 


-0.107 

0 

0 

0 

0 


0 

0 

-39.4 

0 

0 


0 

1 

-0.214 

0 

0 


0 

O 

0 

0 

■157.7 


0 

0 

0 

1 

-0.214 
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c t = 1 1 10 10] 

The dynamic model generating the signal process s 2 (t) is identical to that of 
However, the dynamic model corresponding to the signal process s 3 (r) is 
given by: 

x = ^ 3 Jt(r) + B $ u(t) 

ao = 

where 
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5.4.2 Design of Estimator 

From Eq. (S.70) and (5.71), it follows that two different dynamic models 
corresponding to the correlation functions exist, one for 1 < k < M and the 
other for k 55 M. In what follows we intend to utilise a digital computer for 
the estimation process. The model corresponding to 1 < k < M is given by 
Eq. (5.78). For k = M y let the corresponding dynamic model be given by: 


x = A^jxit) + B^it) 

AO = c^xO) 


(5.82) 


i.e., the dynamic model generates the signal process XO* Let us assume that 
both dynamic models, given by Eq. (5.78) and Eq. (5.82), are of the same 
dimensions. Discretizing Eq. (5.87) yields: 


x(k+ 1) ^A^ik) + £,«(*) 
y(k) = C lX (k) + v(k) 


(5.83) 


In addition, the model given by Eq. (5.83) contains the observation 
(background) noise element v(k) y which is assumed to be white of zero mean and 
variance o 2 . The parameters A % t B y C t are related to A j , B t , and C t by: 


/l = exp (>1, 

B i K i B [ = J ex P (^i expM,s)B,A, exp(- J's) exp |j4' 

C,=C, (5.84) 

where exp is the exponent, ami A, and A. are covariances of u(k) and u(r), 
respectively. We discretize Eq. (5.82) in the same manner. Let 


x(k + I) = A^x(k) + B M u(k) 
Jtk) = C^xik) + »(*) 


(5.85) 
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with its corresponding parameters given by: 


a m = ex P 



~ j eX P {^M yv) eX P ' 

exp (-^s) exp {a' m J-J ds 


(5.86) 


C M " C M 


Example 10 

In this example, let £(*) denote the estimate of CjX(A) or C^xik). Every k 
can be written as k = 32 i + /, for i = 1 ,2 , ...» N and 1 </ < 32, where i is the ith 
scanned line and / determines the position on the ith scanned line. Continuing 
Example 7-9, we can see that the start of the three vertical strips corresponds to 
the values of / * 1 , 1 1 , or 2 1 . For 1 </ < 2 1 , we utilize model Eq. (5.83), since 
the values of r? would be either 1 or 2. For other values of /, we utilize model Eq. 
(5.85). Now for the values of /« I, 11, and 21, the best linear mean square 
estimate of y(k) must be the optimal combination of y(k) and y(k - 32), where 
ihe two estimates use a portion of the observation twice. However, the 
overlapped portion of the observation is very small, and the optimality will not 
be significantly affected by assuming the estimators to be independent. 

The formula for combining two independent estimates x and x of the same 
state variable x to obtain a combined estimate x * with its associated covariance 
error given by (see Chapter 4): 


X* =/>•(?-' Z + ?-'x) 

(5.87) 


(5.88) 


where P and P are the error covariances of x and 7, respectively, thus, applying 
Eqs. (5.87) and (5.88) to v(*) = C9(k) and y(k) = C?(*) yields: 


o 2 (k) 


/(*) 


y(k) + - 


if 2 (k) 


a 2 (k) + 7f 2 (k) (Pik) + ^(A) 


m 


212 



where y * denotes the combined estimate for >>(*)• Continuing Example 9, we 
obtain: 


A \ * a m 


0.996 

0 

0 

0 

0 


0 

0.983 

-1.223 

0 

0 


0 

0.031 

0.970 

0 


0 

0 

0 

0.926 

-4.75 


0 

0 

0 

0.03 

0.93 


B i K i B \ 


0 0 

0 0.01 

0 0.03 

0 0 

0 0 


0 

0.03 

0.15 

0 

0 


0 0 
0 0 

0 0 

0 0 

0 0.01 


¥/« - 


0 0 0 

0 0 0.02 
0 0.02 0.11 

0 0 0 

0 0 0 


0 0 

0 0 

0 0 

0 0.02 

0.02 0.14 


C, =C M = [1 1 0 1 0] 


Utilizing the models given by Eqs. (5.83) and (5.85) with their corresponding 
parameters given by Eqs. (5.84) and (5.86) respectively, a (one-step predictor) 
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«. ... . 



recursive estimator (breach system may now be designed [8j , I he equations tor 
(5.83) are given for the sake ot completeness: 


|$* + !> ■ M, “ *',(**’, \x{k\ + f\{k)ym 


A similar set of equations exists tot (>.85 h the only dtiterenee horn, a cm«t!|| 
of subscripts from 1 to M. 

The one-step predicted estimate y(A» of y(AT ts found recursively in 
time. Hk ) is the observation associated with the grid immediately ahead o 
scanner position. 

Example 1 1 

The signal yin or y(k) U generated by using the image described in she 
preceding example and by adding white noise with variance o* . The peak-to- 
pc.tk variation is 2.56. Let tts select as a measure of signal -lo-noise ratio: 


peak-to-peak variation of 


A value of p of 2.56/10. which represents a very noisy image, was utilized. 
Figure 5-5 represents the uncuntaminatcd image, where the corresponding values 


Fig. 5*5. Uncontsminated Image 


of \ik) and their one-step predictors are shown in Figutes 5*6a and 5-(>b. 
respectively. 

Example 1 2 

Since the length of data is fixed and the observation is available for additional 
repeated processing, it is possible to obtain two one-step predicted values of 
.11 A), denoted as y{k) and r(A). starting from the lop left corner of the image 
and the other by running the .anner in the reverse direction starting ai the 
bottom right comer. Associated with these estimates are estimation error 
variances denoted by o 2 (A) = CT\k) C' and = Cf\k) C'. respectively. The 
result of combining the two estimates for p = I W 10 appears in Figure 5-6c. 




5.5 CONCLUSIONS 

The role of recursive ( Kalman) filtering in image processing lias been 
established. The procedure is applicable to ih»«sc images charactcn/ed statisti- 
cally by their mean and correlation function. A • ccinstu* estimation approach »* 
very desirable due to its computational advantages The effectiveness and the 
computational simplicity of our method toenlrnce contaminated images have 
been demonstrated via examples. 
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**KC$«3ING TAGS BLANK 


NOT filmed 


DIRAC DELTA FUNCTION 


We have often seen tlie “delta” function 6 (jt) described as: 



d(jr) dx = 1 , &(jr) * 0, for x * 0 


We must point out that 6(x) is not a function, but a mathematical symbol. 
We shall discuss the definition of 6 (jt) below. 


Definition 1 

A function 0(f), which is differentiable infinitely many times, is said to 
belong to class C or, symbolically, if the following condition is satisfied: 


lim | !* = 0, for all I and j > 0 

i fi-« 


Note that denotes the /th derivative. 

Now we need to define another e> «sion. 
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Definition 2 

The sequence of functions £,(/),£ 2 (/), ... of class C is said to be regular if 
for any function 0(f)€C: 


lim (g n , 0) £ lim 

fl-M* W-*°® 



g n (t)mdi 


is fimte. 

Example 

Consider the sequence 

{yf exp( " n,2) } ■ fe « (r)} 

The function f n (r) is of class C and 


lim j^exp (-/t/ 2 ) * 


However, for any function 0 € C, 


lim te n ,0) 

is finite. 

Definition 3 

Two regular sequence of functions lg n (0} and {* rt (0) are equivalent if 

lim (g n (r), 0) * lim (h n {t\<t>) 

• n—°° 

We shall denote g ~~ h . 

n n 
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For example. 


| N /| e xp <-„>)} »d {-J_ 


exp 



are equivalent, even though the functions are not equal to each other. 


Definition 4 

If the limit of {g n (/)} (with respect to a function 0EC) converges to a 
function g 7 i.e.. 


fe,0)= lim <£ n ,0) 

n-*® 


then g is called a generalized function and g~~ {g n ). A generalized function 
denoted by u is called a unit step function if 

ujt) $>(/) dt = J u(t) 0(0 dt 

for all classes of {u n (0}. where 





! I, if / > 0 
0, if t < 0 


Example 

The sequence 


j exp - 

;(M] 

. if ? > 0 

«„<'> = \ L 

n\r r j 


(o. 


if t <0 


represents a generalized unit step function. 
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Definition S 


The unit impulse or Dirac delta function 6(/) is defined as: 

6 ~ {«/(/)} 


That is, 


(S,0)= lim (u„,4>) 


It should be emphasized that 6(f) is merely a symbol representing the total 
class of equivalent regular sequences (if n (f)}. Hence, 


£ 6(r) 0(r) dt = lim f u' n (t ) <fi(t) dt 

f| -+oO m/— oo 


Example 

The sequence {w n (f)} given by: 




D ■ 2i ] txp [ »(M] ■ 


0 , 


if f >0 


if f <0 


is only one sequence which represents 6(f). Other sequences are: 
exp (-rtf 2 ) 




etc. 


The following important properties of 6(f) w r: hold: 


J f0>O 

6(f) /(f) </f =/<0) 

<*<0 


where / is differentiable over the interval a < f < 0. 
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m so- a) dt*m 


( 2 ) 



Both equations can be proven from the definition and utilizing the 
integration by part. 

f r '° 

(3) / 8(r)</r = l,6(/) = 0, ry=t 

Ja<0 


221 



preceding pact hank not filmed 


APPENDIX B 

VECTOR SPACES AND MATRICES 

Definition 1 

Let V be a set; then V is called a linear vector space over the real or the 
complex field if the following rules are satisfied: 

(1) If x e V t yE V f thenx+jvG V 

(2) (x + y) + z = x + (y + z) 

(3) There exists a “zero” vector 0 € V such that x + 0 = 0+ x I =xfor 
every x6K 

(4) For every x € V. there exists another x~ E V such that x + x" = 0 

(5) x + y = y + x for all x and y E V 

There exists a set of scalars (either real R or complex C) denoted by Greek 
letters such that: 

(6) (a + 0) x = ax + fix (Distributive Law) 

(7) a(x + y) = ax + ay (Distributive Law) 

(8) (o0) (x) s a(/k) (Associative Law) 

(9) 1 • x = x 

(10) 0 • x = 0 

The most important example of the vector space is R n . It can be shown 
that a set V is a vector space iff for any x,)’G V and any scalars a and 0, ax 
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Definition 2 

Let V and W be linear vector spaces over the same field of scalars, and let 
T be a mapping (transformation) V -► W such that: 

(1) T\x + y) * 75r + Ty for all x and y E V 

(2) 7\ca) = qTx for all x E V and all scalars a 

Then T is said to be linear. 

Definition 3 

A set of vectors {*, , * 2 is a basis in K if: 

(1) The set is linearly independent (no x*% can be written as a linear 
combination of the other vectors). 

(2) They generate the vector space V, i.e., every jc E V can be written as a 
linear combination of jr t ,Jt 2 , . . . ,x 3 . 

Definition 4 

The number of linearly independent vector n in Definition 3 is called the 
dimension of the vector space K 

(1) A(x + y) = A(x) + T\y\ for any x and yEV 

(2) A(ax) = Oi4(x), for any scalar a and xEV 
Definition of Matrices 

Let {e,,e 2 , . . . ,e n > be a basis in V and . . ,/ } be a basis in W , 

and assume ,4 is a linear transformation 


A:V-> W 


Then A(ej) E W for ah ; = 1 ,2, . . . , n which implies that: 


^ e /) = E a ijf f 


i 


(B.l) 


or in the expanded form: 

A < e i + - • +a «l4, 
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AeJ = « 12 /, + f 2 + • • • + * m2 / m 

= a xnh + a 2nf 2 + - • +a m„4, 


Definition 5 

Now the matrix of A denoted by M A with respect to the above basis is 
defined as: 


r«,, fl , 2 •••*.»' 


fl 21 °22 “ a 2n 


M. 


°ml a m2 • mX „ 


K/JmXn 


Thus the matrix (fl^l mXn depends on the linear transformation A as well as 
the bases in V and W. 

Let A and B be linear transformations with respect to the same spaces; 
then the reader is advised to prove the following properties: 

(I) *4 

where M A and A/ are the matrices with respect to the operators A and B y a 
is a scalar, and (<L + b l mXw is the matrix with respect to the operator 
A+B 

Definition 6 

(1) If Ax 25 x for every x y then the operator is called the identity and is 
denoted by /. 

(2) A is a zero operator if Ax - 0 for every x E V. 

(3) If V = W y then A is said to be invertible iff Ax % - Ax 2 implies x x = 
x 2 and, for every y € V y there exists an *€ V such that Ax - y. If A 
is not invertible it is said to be singular. 
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Thus, for a zero operator A , the corresponding matrix will have zero 
entries. It can also be shown that A is invertible (nonsingular) iff A* = 0 
implies x * 0. 

Let K, W f and A be as before; then for every vector x € V: 


x 


n 




(B.2) 


where the are scalars, called the “coordinates.” Since Ax € W y then: 


*-£*1/1 


/= 1 


(B.3) 


Now we can claim the following important result via a theorem. 


Theorem 1 

If we designate r and by: 



V 


X 


y 2 



r = 


and = 

■ 




A 


Then the following is true: 


r = AT* 

A 


or, equivalently, 


y, 


E Vr tm,u ] ' 2 ' 


i i 


. . . n 
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Proof 




/=i y=i 1=1 


V# = 



fi 


where Eq. (B.l) has been used. Now the above equation equated with Eq. 
(B.3) yields the result. 


Definition 7 

Let {e ( ,e 2 , . . . ,e w ) and {h x% h ^ . . . ,h n ) be bases in V. Since 6 V for 
all / 25 1 ,2, ...» ft and {e t t e 2 , . . . , e n } is a basis, then 


‘,-E 


i~ i 


Py t / “ 1 >2, . . . , m 


(B.4) 


Now the matrix P = [P^l^x,, is called the matrix of transition from the basis 
{e lt e 2 ,...,e rt }to the basis {h ty h 2 , . . . f h n ). 

Let Q denote the matrix of transition from {h t ,h 9 . . . t h } to 
{e ( , e 2 , . . . , e n ); then it is simple to verify that: 

0 = P _I (B.5) 

Let x 6 V, then 


x 


= £*/ e r 


/= I 




i- I 


where £,’$ and y ( '$ are coordinates. 

It would be very easy to verify that: 


$1 ~ ^ j 


i~ i 


(B.6) 
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or, equivalently, 


V 


Pi 1 Pit Pin 


7 , 

^2 


P 2l P 22 ' ‘ Pm 


1 h 

}» 


Kl Pm'Pnn 


L_ - 


(B.7) 


With the above background, we are ready to state a major result in linear 
algebra given via Theorem 2. The proof will not be given here. 

Theorem 2 

Let T be a linear transformation from V -+ W and let (e, , e 2 , . . . , e n ) and 
[e r e ^ $ ... ,e' n ] be bases in V and let {/*, ,/ 2> . . . J m ) and . . . ,/j^} 

be basis in W. 

Let denote the matrix of T with respect to bases [e t , e 2 , . . . , } and 

{/^/j / m ) and ^ be a matrix with respect to the bases [e ' x , e * 2 , 

. . . } and respectively. Also let 5 and U denote the 

FI 12 Frl ^ j i 

matrices of transition from {e p e 2 , . . . ,e n ) to {e Jt e 2 , . . . ,e n ) and from 
{/*, J r • • • J m ) to </|,/ 2 respectively. Then, 


*& = U~ l S 


(B.8) 


For Proof see any linear algebra book. 

Important Corollary 
If T:V-+ V , then 


* ? = 5' , «J3/ r 5 (B.9) 

Because S = U, its substitution in (B.8) will yield the result. 

Definition 8 

We now define eigenvalues and eigenvectors, which are used often in our 
analysis. 
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let A:V ^ V; then if 


Ax 8 Xx 

where x £ V and X is a scalar, then x is called an eigenvector and X is calied 
the eigenvalue. In general, 0 is an eigenvalue iff Ax 8 Ox 8 0 for some x # 0, 
i.e., A is singular (not invertible). If A 8 / (identity operator), then lx 8 x «-» 
X * 1 . In the definition Ax 8 Xx, we say x belongs to X. 

Discussion 

(1) If Ax 8 Xx, then 


Ax - Xx 8 0 «-+ (Ax - Xx) - (A - X/)x 8 0 


Thus, x is an eigenvector iff (A - X/) is a singular operator which is equivalent 
to saying that: 


determinent^. x/) 8 IA/ W _ X/) I 8 0 

(2) From now on, we shall use A for M A , if there is no confusion about 
M a with respect to the specific basis, since there is a 1-1 correspond- 
ence and onto mapping from A to M A (isomorphism). 


Definition 9 

If a matrix A * satisfies: 


A* = (a if ) T 

where the bar denotes the complex conjugate, and T denotes the transpose, 
then A* is said to be an adjoint matrix of A (operator). If A* * A, then A is 
said to be self-adjoint. 


Definition 10 

An inner produce on vector spaev V (over the real or complex field) is a 
complex number such that for every V and for any scalars a and 0 

the Allowing are satisfied: 

(i) (x*y) = 6\*) 
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(2) (ax + 0, z) * a(x, z) + 0(y, z) 

(3) (x,x) iff* # 0 


Definition 11 

The norm of a vector x denoted by ||x|| is defined via: 


Ml 2 = (X,x) 


Now we are ready to make a very important definition. 

Definition 12 

If A = A * , then (4 x,jc) is said to be positive definite if 


(A*,x) > 0, for all x ^ 0 


and negative definite if 


(Ax,x) < 0, for all x £ 0 


Similarly, if A satisfies 


(Ax>x) > 0, for all x 0 


ther A is said to be positive semi-definitc; the definition of negative semi- 
definiteness is done in a similar manner. 


Definition 13 

If A is none of the above, A is said to be indefinite, that is, (>lx,.v) >0 
for some x and (/4x,x) <0 for another x. 
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Definition 14 


Hie quadratic foim of A ■ A* is defined via 


(?(*) 

i- 1 1 1 

where £ ( ’s are the coordinates of the vector x. 

The above background should suffice to support the material in the text. 
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APPENDIX C 

FOURIER AND BILATERAL LAPLACE TRANSFORMS 
AND THEIR INVERSIONS 


The power spectrum is the Four :r transform of the wide-sense stationary 
autocorrelation function. Thus, the manipulation of the Fourier transform and 
its corresponding inverse is extremely important. If a function / (/) has a Fourier 
transform, it will also have a bilateral Laplace transform. The inverse of each 
transform is unique; however, it is easier to obtain the inverse of a bilateral 
Laplace transform. Thus, the procedure of obtaining the inverse Fourier trans- 
form is to obtain the corresponding bilateral Laplace transform and apply the 
inversion formula. Thus, in what follows, a discussion of Fourier and bilateral 
Laplace transform is made. 

Before we get involved with the concepts, we need some mathematical tools 
such as definitions and theorems; however the proofs are not provided. 

Definition I 

A function (complex) f(s ) is analytic at s Q if / is single valued and dif- 
ferentiable at s Q . 

Theorem I (Cauchy’s Integral Theorem) 

Given the function /(s) such that / is analytic at all points vithin and on 
any closed curve C in the complex plane, then 


/(s) ds = 0 


where the integral designates the ntegral along the closed path C. 
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Theorem 2 (Cauchy's Integral Formula) 

Let / and C be as above; then for any point a which is an interior point in 
C the following is true: 



(C.l) 


The result is proven via the aid of Theorem 1 . Thus, in Theorem 2 every 
analytic function /(s) is completely determined in the interior of a given close 
curve C, where the values of /(s) are given on C only. Next the last two 
theorems are extended to get an important result which we shall give via 
Theorem 3, but first the angularities. 


Definition 2 

If /($) is not analytic at point $ 0 , then s Q is called a singular point. If there 
is a neighborhood of s = s Q sucli that /(s) has no other singular point, then 
is called the isolated singularity and, unless specified otherwise, all the singu- 
larities in the appendix a r e isolated singularities. 


Example 

f(s) * 1 Is has an isolated singularity at s = 0, since the neighborhood given 
by Isl = p > 0 contains no singularity other than 0. Similarly, 


/($) = 


s - 1 
$(s 2 + 4) 


has three isolated singularities at s = 0, s = 2 !/, s = -2/. The function 

f(s) = exp | — 5— 

(l - s 2 

has two isolated singular points at s • 1 and s = - 1 . 

Note that in the first two cases, the singularities are poles, and in the third 
case it is not a pole. 
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Another Example 
The function 


m 



has singularities at s * ± !/(&?), k = 1*2, These singularities are isolated; 

however at s * 0, the singularity is not isolated, regardless of how small the 
radius p of the circle Isl * p may be. 

If /(s) has an isolated singularity at s * s 0 , then /(s) can be represented via 
the infinite series: 


f(s) =t > 0 + *,(* - V + M 5 ■ V 2 + " 



(S - 


+ ' * * 


»i> n (s-*o) p,+ £ 




(C-2) 


The above series is called Laurent's series and b { is called the residue of 
f(s) at the singularity s = s Q . 

Definition 4 

A special case is where 


7TT + 


- 2 


+ * • * + 


n=0 


o (s - s 0 Y 


(* - ° 0 r 


(C.3) 


The singularity (isolated) s = s Q is called a pole of order m. 
Remark . For Eq. (C.3) h is given by: 

r-'ld-j/ f(s) ) 


b . = 


1 


> (m - 1)! 


ds 


m - I 


(C.4) 



If m ■ I, then i«j # i» said to be a simple pole and (0.4) reduces to: 


'-i 


ton /(*)(* -* 0 ) 


(C.5) 


Theorem 3 

Let f(s) be analytic in the given region R bounded by the closed cum C 
and let s ( ,s 2 , • • • ,* m be tire isolated singularities of /(s) in the interior of C, 
then 



tn 

/(s)ds = 2ir, £ (*_,), 


k~ I 


(C.6) 


where (b % ^ is the residue corresponding lo s fc . 

The result is called the residue theorem which states that regardless of how 
complicated the calculation of integral of /(s) around the contour is, it can be 
obtained by the summation of all residues multiplied by 2 

Equation (C.6) will {day a major role in the inversion process of a trans- 
form. 


Definition 5 

Let /(/) and F 0 (s) be functions defined by: 

B 


V> = 



f{t) exp (~st) dt 


(C-7) 


Then we say F B (s) is the bilateral Laplace transform of /(f). provided that 
F b (s) exists in some region o ( < o < o 2 . 


236 



Theorem 4 


If F b (s) exists, then / (/) can be obtained: 


m 


2vj 


lim 



F b (s ) exp (sf ) ds 


(C.8) 


where d and R are given via the sketch and o § <d <a 2 (see sketch). 



hoof 

For the bilateral transform, the regions of convergence for f(t) is generally 
given by o ( < a < Oy However, for the one-sided Laplace transform, the 
region of convergence is normally given by o > o Q . 

For t > 0, then we can show: 


lim 


J 

•'cefga 


F fi (s) exp (st) ds » 0 


(C.9) 


and for t < 0: 


lim 



F fl (s) exp (sr) ds - 0 


(C.10) 


Equations (C.9) or (C.10), together with (C.8), implies that abc may be 
changed to abcefg for t < 0 and, for t > 0, abc can be changed to abcha. 
However, either abcefg or rbcha is a closed contour enclosing all the singula*- 
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ities as long as R -*■ «>, which implies we can directly use the residue theorem 
(Theorem 3). 

Thus, given t > 0, 



F b (s) expist)ds*JT(b_ t ) k 


(C.ll) 


where (b _ , ) k is the residue of the Ath singularity to the left of abc. For 
/ < 0, /(f) is given by: 


fit) 



F b (s ) exp (sf) ds = -J2 (*_ , ) k 


(C12) 


where (b_ t ) k is the residue of the kth singularity to the right of abc. The 
negative sign signifies the fact that the direction of abcha is clockwise and, 
therefore, negative. Thus, we have proven the inversion formula. 

If f(t) is absolutely integrable, i.e.. 





then we shall define 


^<o) = 



fit) exp (-/Wf) dt 


(C.13) 


as the Fourier transform of /(/). It can be shown that given & (w), /(/) 
satisfies: 



3F(cj) exp (/c of) dio 


(C.14) 
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Basic Fourier Transform Pairs 



( 12 ) 




exp 


(_ /ww _A.) 
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Equations (C.13) and (C.14) are called the Fourier transform pair. Now if the 
Fourier transform of /(r) exists, then for a fixed o > 0, the Fourier transform 
of /(f) exp (-of) would also exist (it is absolutely integrable). Then 



1 /(f) exp (-of)J exp (-/cor) dt = 



/(f) exp [-(o + /cor)] dt 


Let s = o + /« and denote the right-hand-side of the integral as F(o + /c o) or 
F(s ). Now it is obvious that the function /(f) exp (-of), given its Fourier 
transform F(o + /to), is: 


/(/) exp (-at) = ^ 1 [/=To + ;cj)] 



F(o + /to) exp (/tof) t/to 


The last equation utilizes the inversion formula of a Fourier transform. Multi- 
plying both sides of the equation oy exp (of), we get: 



F(o + /to) exp (o + /to) t tfto 


Now making the change of variable s = o + /to will yield: 


fit) = 



F(s) exp (st) dz 


(C.1 5) 


However, FT(s) is exactly the bilateral transform F B (s ). Thus, we shall utilize 
the bilateral Laplace transform inversion formula. 

Note that the inversion of both Fourier and bilateral transforms are unique 
and if the Fourier transform of a waveform /(f) exists, so does its bilateral 
transform. The bilateral transform F fi (s) can be obtained from *P(j to) in a 
unique manner, by substituting: 


F b (s) = ^(/«)|, =/w 
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APPENDIX D 

A SPECIAL VECTOR SPACE 


Let V N be an JV-dimensional vector space over a complex field. Let 
* • * yf N ) = {/)}£, * )as * s * n V N ' there is an mnei Product 

defined with an associated norm, then it is a standard result that 
{f|>/ 2 > • • • >f N ) can be orthonormalized. That is, {e^}^ is a basis such that: 


>f /■/ 

0, if/#/ 


Now for any vector x € V N , it can he shown that: 


£< x ’ e /) e , 


and 


llJrll 2 


= £'(*.*, )' 2 


/= I 


The idea of orthonormalization can be extended to the infinite dimensional 
case. 
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An Infinite Dimensional Vector Space 

Let L 2 denote the set of all piecewise continuous functions over [0,2rr] 
such that: 



I /■ (f ) I 2 (It <0° 


(DO 


It can be verified that L 2 is a vector space under the usual operations of 
functions: (/ + g) (r) = /(r) + g(t) and (a/) (f) = a</ (r». 

Now let us detV’.e the inner product (f,g) by: 


(/;*) = 



fu)g(t)dt 


(D.2) 


where the bar denotes the conjugate. Thus, the corresponding norm is given 
by: 


11/ If = 


J r2it 
0 


1/(01 2 dt 


(D.3) 


A simple computation shows that exp (jnt) for n = 0,±1,±2, ... are mutually 
orthogonal in L 2 and it can be shown that: 


(exp {jmt) y exp (jnt)) = 


0, if m ^ ** 


2rr, if m = n 


(D.4) 


However, we can orthonormalize the collection 


{exp </«!•»”:: 
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by letting 


VS" 1 ’’ m 

L 2 with an orthonormal basis is said to be a complete space. Recall any 
finite-dimensional vector space is complete. 

Let H be a subspace of L 2 which is generated by 




that is, H consists of all linear combinations of the form 


«=*» 

E 


<* e n 

n n 


where <* n ’s are scalars. 

Now for every / E //, we can write: 


/<»>- 1 




(D.5) 


where a =(/“,e ), and a can be written as: 
n n n 


(f,e n ) 


i r 2 ” 

/(Oexp {-jnt}dt 


(D.6) 


Thus, from Eqs. (D.3) and (D.5), it is easy to verify that: 


ni 2 = £ i(r.e„)i 2 


(0 7) 
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Now remembering that: 


ii/’h 2 



!/(»)! 2 dt 


and utilizing the fact that a n = we can rewrite: 



1/(01 2 dt 



Equation (D.8) is called ParsevaPs equality. 

Important Remarks 

(1) It must be emphasized that the expansion 


(D.8) 


/( 0 * 2 % e n W = ~rh= 2 a n eX P ^ ( D9 > 

«=-» V n n =-° o 

is not interpreted as saying the series is pointwise converging to the 
function. Equation (D.9) actually means that / 6 L 2 is given by: 


4<'> = °* exp m 


and converges to f in the norm specified in L . That is: 


iir- 411 s pT i/w -/„(»)! 2 


1/2 


0 (D.10) 


(2) If we change 2 n to T and the interval [0*27r] is changed to (-772, 
772] , we can then write: 


fit) = £ c n eXp 

n--oo 


(Dll) 
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where gj 0 = 2tt/T. Since 


{exp {jnu) 0 t}}~ = a 


are pairwise orthogonal, 


(exp {/may}, exp (/nw 0 /}) ; 


0, if m ^ n 


T, if m * n 


Thus, we have: 


f(0 * £ «„ e „(t) = E '/ fc n e n (r) 

n=-« • it*-» 

where e n = (l/V/) exp {ncJ 0 t} and a = (f, e n ). 

ParsevaJ’s equality becomes: 


ii/t 


/ T/2 - 

\f(t)\ 2 dt = ^ la J‘ 
r/2 «=-» 


• L £ Ic.M 


From which, we obtain: 


i r 772 a 

fl l/(0l 2 </r = 2 lf, J' 

^-r/2 n = 


The last equation is another form of ParsevaTs equality. 


(D.I2) 


(D.13) 


245 



PRECEDING PAGE BLANK NOT FILMED 


wNNrHo* mm 


APPENDIX E 

STATE VARIABLES 


Let X(t) be an n-vector such that: 


X = A(t)X{t), X{t 0 ) - A" 0 (E.l) 


where X(t) and A(t) are continuously differentiable and A(t) is an n X n 
matrix. The solution of Eq. (£.1) is given by: 


X(t) = rj X(t 0 ) * tit, / # ) X Q <E.2a) 

where 

0 = A(t ) <t>U, t 0 ), «(r 0 , t 0 ) * / (E.2b) 


This is easy to verify, since the solution of the differential equation for a 
specified condition is unique and A’(/) in Eq. (E.2) will be a solution with the 
initial condition: 


X{‘ 0 ) = W 0 >t 0 )X 0 *lX 0 = X 0 
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Two Important fotpetties 

Let f, and t 2 be two different times such that f, and t 2 are > r # . Then 
we have: 


*(/ 2 ) = «/ r f # )* 0 (EJ) 

and 

*l',> = rf','o)*o (E.4) 

Now if the initial condition is at / ( , then X(t t ) is given by: 

JS(f 2 > (E.5) 
Substituting > from Eq. (E.4) into Eq. (E.S) yields: 

X(i 2 ) = *(r 2 ,r l )'Mt l ,r 0 )X 0 (E.6) 

Comparing (EJ) and (E.6) gives rise to: 

= < F7 > 

As a special case of Eq. (E.7), let t 2 = t Q . Then 

tfV'o> = / = «V'i>«'i’'o> 

from which 

< E 8 > 


Equations (E.7) and (E.8) are very important. It can be verified that 0(-,-) 
^0. From Eq. (E.8) it is obvious that the inverse of 0(f,, t Q ) is obtained by 
changing the arguments i % and / Q to t Q and r , respectively. 
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Example t 
From 

X * 2X f *(f 0 ) * X Q 

Solve the differential equation via the transition matrix. 

0 - 2 * = 1 


will imply that 


*(r, f Q ) * exp {2(1 - / 0 )} 


Thus, 


X(t) = *(/, / 0 ) X 0 = JT 0 exp {2(f - «„)} 


Example 2 

Repeat Example 1 for: 


jNa(r)W), *(r 0 > = * 0 


Solution 


0 = «(r)0(r./ o ). = • 


implies that 0/0 = <r(r> from which wc get : 


*(r, r„) * exp 


UH 
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Thus, 


Jf(r) = X 0 exp 



General Solution with Forcing Function Inputs 

Consider the genera! time-varying differential equation: 


X = /t(r) X(t) + B{t) U(t) 

nt) = cxt)xu)+mm 


(E.9a) 

(E.9b) 


Assume the solution Jf(r) exis's and 


x(> 0 ) = x 0 


is the initial condition. We claim A^f) is given by: 


X(t) = <t>U,t 0 )X 0 + f 4<t,t 0 )<ir l (K,t 0 )B(K)U(\)dX (E.lOa) 


■JT~ 


= <t>U,t 0 )X 0 + I W,\)B(\)U(\)d\ 


(E.lOb) 


Let us verify Eq. (E.IG). For convenience, we shall not write the arguments in 
t. Let 


Af(/) = 0(f, t Q ) Z(t) or, equivalently, Z(t) = <t> '(f, f Q ) X{t) (E.ll) 

Taking the derivative of both sides yields: 

X = <t>Z + <t>Z (E.12) 
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Equating the right-hand side of Eq. (E.9a) with (E.12) gives rise to: 

AX + BU = *Z + tfZ 
Now from ^ we assert that: 

j>Z = * .4* 

where in the above we have used Eq. (E.l 1). 

Substituting (E.I4) into Eq. (E.12) yields: 

$Z = BU or, equivalently, Z = 0 -1 Zf{/ 
where upon integration, we get: 


Z(f) = Z(0 + 


/ *"^'0 


)£<X)t/(X)«A 


Utilizing Eq. (E.l 1) and the fact that Z(f Q ) = X(t Q ), we obtain: 


*(/) = #r,f 0 )* 0 + 


f «i,f # )*- , (X,f # 


) fi(X) (/(X) cfX 


which concludes the first part of the proof. 

To prove the second part, we make use of 0~*(A, f Q ) = 0(/ Q 
implies: 

<>U, t 0 ) '(X, t Q ) = 0(f, t 0 ) w 0 , X) = 0(/. >> 

Substituting (E.l 8) into (E.l 7) gives rise to 


*(r) = W ( f 0 )X 0 



0(f, X) B(X) l/(X) dX 


(E.13) 

(E.14) 

(E.l 5) 
(E.l 6) 

(E.l 7) 

, X) which 
(E.l 8) 

(E.19) 
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which is the desired result. 


Substituting (E.17) or (E.19) into (E.9b) will yield the output. Therefore, 


Y(t)=c(t)x(t)+m m 


s f( o 



\)B(k)U(\)d\ 


+ du) m 


(E.20) 


Thus, the most important part of the solution is acquisition of the transition 
matrix 0(v), which is needed to solve Once Afy) is known, T(/) can be 
obtained immediately (see E.20). 

To obtain 0(v) for the time-varying case is not easy and the general 
equation 


0 = 0CW 7 


must be solved for. However, for the time-invariant case, where A, B y C, and 
D are constant matrices, the solution is considerably easier. Before discussing 
this special case, let us first define: 


exp (At) £ / + At + + ' “ + + 


2,2 


A r t n 


(E.21) 


Now for the time invariant case, 0(M O ) becomes: 
*>('< / 0 ) * exp {A(t - t Q )} 


(E.22) 


To verify (E.22) is very simple since 

~ exp {A(t - t Q )} = A exp {A(t - r 0 )} 


with <t>(t 0 *t 0 ) - A 0 - /. Now, without any loss of generality, assume t Q = 0 
and let us state the following claim. 
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The transition matrix exp {At} is obtained as: 

exp {At} = '(«/ - AT 1 (E.23) 

Thus, exp [At] is the inverse Laplace transform of (si - AT 1 . 

The proof is simple Take the Laplace transform of (E.9a) to get: 

s3T(s) - X 0 = A 3\s) + B%t(s) (E.24) 

where ^T(s) and^s) are corresponding Laplace transforms of X ( •) and £/(*). 
This can be done since A and B are both constant matrices. From (E.24), we 
can get: 

3T($) = (si - AT 1 X 0 + (s* - AT 1 B<W(s) (E.25) 

Taking the inverse Laplace transform of the above and equating the result 
with the right-hand side of (E.19) with t Q * 0, we obtain: 

exp {At } « SB~ x (sI - AT 1 

as asserted. 
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Independent events, 4 
Instaneous power, 70 
Integrals, see Stochastic integrals 
Interpolation function, 90 

Jocabian, 15 
Joint density, 8 
Joint distribution, 8 


hmptv Set, 2 
Ergodicity, 49, 56 
Error function, 19 
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Kailath, 255 
Kalman, R. E., 108 
Kalman-Bucy filtering, 143*167 



Laplace, see bilateral Laplace transforms 

Lathi, 254 

Leadbetter, 254 

Liebelt, 254 

Linear estimate, 1 1 3 

Linear- mean-square estimation, 1 13, 1 21 

Linear systems, 62 

Linear transformation, App. B 

Low-pass signals, 91 

Marginal density function, 8-9 
Marginal distribution function, 8-9 
Markov processes, 143 
Matched Filtering, 140 
Mean, 16 

Mean-square, continuity, 46 
Mean-square-estimation, 110-113 
Meyer, 254 
Moore, 255 
Mutual events, 2 

Nahi, 175, 176, 254 
ff-dimensional space, App. B 
Noisy observations, 107 
Nonlinear systems, 109 
Norm, 1 16, App. B 
Normal random variables, 7 
nth order density, 14 
nth order distribution, 14 

Operator, 61 
linear, App. B 
Optimum filter, 1 23 
Orthogonal process, 46, 86 
Orthogonality principle, 1 1 5 
Outcomes, 1, 8 

Papoulis, 254 

Partial-randomization, 200 
Partitioned image, 200 
Periodic processes, 91 
Poisson process, 47 
Positive definite, App. B 
Powell, 1 76, 255 
Power spectrum, 61, 70, 73 
Power spectral density, 71 
Prediction, 137, 162 
Probability density function 4-10, 33 
Probability distribution function, 4-10. 
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Random variables, 4 
complex, 17 
discrete, 5 
periodic, 91 
Random vectors, 20 
Rayleigh density function, 7 
Recursive filtering (see Kalman-Bucy 
filtering) 

Recursive image estimation, 1 80 
Realizable systems, 64 
Rhodes, 177, 255 
Risk function, 194 


Sample functions, 29 
Sample space, 1 
Sampling theorem, 87 
Scanner, 180 
Schwarz, 254 
Schwarz inequality, 17 
Second-order statistics, 38 
Silverman, 1 75, 255 
Shanon, 87 

Signal-to-noise ratio, 140, 171, 197 
Smoothing, 137 
Spectral factorization, 1 76 
State-variables, 1 10, App. E 
Standard deviation, 16 
Stationary correlation function, 41-42 
wide- sense, 4 2 
St ieltjes integrals, 16 
Stochastic continuity, 46 
Stochastic differentiation, 46-48 
Stochastic integration, 49-55 
Stochastic processes, 29-32 
Systems 

dynamics, 64 
instaneous, 64 
lumped, 64 
memory, 64 

Systems and modeling, 108 


Time averages, 56 
Time-invariant systems, 65 
Transition matrix, 146, 178, App. E 
Transformation linear, App. B 
Two-dimensional signals, 175 


Quadratic mean, 18 
Quadrature component, 103 
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Unbiased estimates, 1 14 
Uncorrelatcd processes, 46 



Uncorrelated random variables, 29 
Uniform density functions, 7 


Variance, 16 

Vector spaces, 1 16, App. B 
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White noise, 82 
Wide-sense stationarity, 42 
Wiener. 108 

Wiener-Hopf equation, 1 24- 125, 131 
Wiener-Kolmogorov theory, 122, 130 
Wong, 254 
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